Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Logging Experiments with MLflow | Experiment Tracking and Versioning
MLOps for Machine Learning Engineers

bookLogging Experiments with MLflow

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
import logging import mlflow import mlflow.sklearn from sklearn.datasets import load_diabetes from sklearn.linear_model import Ridge from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") # Load sample dataset X, y = load_diabetes(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Set experiment name (creates if doesn't exist) experiment_name = "DiabetesRegression" mlflow.set_experiment(experiment_name) logging.info(f"Using experiment: {experiment_name}") # Start an MLflow run with mlflow.start_run() as run: # Define and train model alpha = 0.5 logging.info(f"Training Ridge(alpha={alpha})") model = Ridge(alpha=alpha) model.fit(X_train, y_train) logging.info("Training complete") # Predict and calculate metric predictions = model.predict(X_test) mse = mean_squared_error(y_test, predictions) logging.info(f"Test MSE: {mse:.6f}") # Log parameter, metric, and model mlflow.log_param("alpha", alpha) mlflow.log_metric("mse", mse) mlflow.sklearn.log_model(model, "model") logging.info("Logged params, metrics, and model to MLflow") # Optionally, log data version or hash for reproducibility mlflow.log_param("data_version", "sklearn_diabetes_v1") # Print run information to stdout run_id = run.info.run_id experiment_id = run.info.experiment_id artifact_uri = mlflow.get_artifact_uri() logging.info(f"Run ID: {run_id}") logging.info(f"Experiment ID: {experiment_id}") logging.info(f"Artifact URI: {artifact_uri}")
copy

To understand how experiment logging works in practice, you can follow this step-by-step breakdown of the provided code. First, the code loads a sample dataset using load_diabetes from scikit-learn, then splits it into training and test sets. The experiment is named using mlflow.set_experiment, which either selects an existing experiment or creates a new one if needed.

The main part of the workflow begins with mlflow.start_run(), which initializes a new run and ensures all subsequent logs are grouped together. Inside this run, a Ridge regression model is defined with a specific alpha parameter and trained on the training data. After training, predictions are made on the test set, and the mean squared error (MSE) is calculated as a performance metric.

MLflow's logging functions are then used to capture key aspects of the experiment. The alpha parameter is logged with mlflow.log_param, and the computed mse is logged as a metric using mlflow.log_metric. The trained model itself is saved as an artifact with mlflow.sklearn.log_model, making it easy to retrieve or deploy later. For reproducibility, the code also logs a data_version parameter, which records the origin or version of the dataset used for training.

Note
Warning

Warning: always log relevant metadata such as data version, random seed, and environment information. Without these, reproducing results or debugging issues becomes much harder.

question mark

Which of the following statements accurately describe the steps in the MLflow experiment logging workflow as demonstrated in the code breakdown?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 2. Kapitel 2

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Suggested prompts:

Can you explain what MLflow is and why it's used here?

How can I view the logged experiment results in MLflow?

What does the 'alpha' parameter do in Ridge regression?

Awesome!

Completion rate improved to 6.25

bookLogging Experiments with MLflow

Swipe um das Menü anzuzeigen

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
import logging import mlflow import mlflow.sklearn from sklearn.datasets import load_diabetes from sklearn.linear_model import Ridge from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") # Load sample dataset X, y = load_diabetes(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Set experiment name (creates if doesn't exist) experiment_name = "DiabetesRegression" mlflow.set_experiment(experiment_name) logging.info(f"Using experiment: {experiment_name}") # Start an MLflow run with mlflow.start_run() as run: # Define and train model alpha = 0.5 logging.info(f"Training Ridge(alpha={alpha})") model = Ridge(alpha=alpha) model.fit(X_train, y_train) logging.info("Training complete") # Predict and calculate metric predictions = model.predict(X_test) mse = mean_squared_error(y_test, predictions) logging.info(f"Test MSE: {mse:.6f}") # Log parameter, metric, and model mlflow.log_param("alpha", alpha) mlflow.log_metric("mse", mse) mlflow.sklearn.log_model(model, "model") logging.info("Logged params, metrics, and model to MLflow") # Optionally, log data version or hash for reproducibility mlflow.log_param("data_version", "sklearn_diabetes_v1") # Print run information to stdout run_id = run.info.run_id experiment_id = run.info.experiment_id artifact_uri = mlflow.get_artifact_uri() logging.info(f"Run ID: {run_id}") logging.info(f"Experiment ID: {experiment_id}") logging.info(f"Artifact URI: {artifact_uri}")
copy

To understand how experiment logging works in practice, you can follow this step-by-step breakdown of the provided code. First, the code loads a sample dataset using load_diabetes from scikit-learn, then splits it into training and test sets. The experiment is named using mlflow.set_experiment, which either selects an existing experiment or creates a new one if needed.

The main part of the workflow begins with mlflow.start_run(), which initializes a new run and ensures all subsequent logs are grouped together. Inside this run, a Ridge regression model is defined with a specific alpha parameter and trained on the training data. After training, predictions are made on the test set, and the mean squared error (MSE) is calculated as a performance metric.

MLflow's logging functions are then used to capture key aspects of the experiment. The alpha parameter is logged with mlflow.log_param, and the computed mse is logged as a metric using mlflow.log_metric. The trained model itself is saved as an artifact with mlflow.sklearn.log_model, making it easy to retrieve or deploy later. For reproducibility, the code also logs a data_version parameter, which records the origin or version of the dataset used for training.

Note
Warning

Warning: always log relevant metadata such as data version, random seed, and environment information. Without these, reproducing results or debugging issues becomes much harder.

question mark

Which of the following statements accurately describe the steps in the MLflow experiment logging workflow as demonstrated in the code breakdown?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 2. Kapitel 2
some-alt