Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer Building an ML CI Pipeline | Section
Advanced ML Model Deployment with Python

bookBuilding an ML CI Pipeline

Veeg om het menu te tonen

A continuous integration (CI) pipeline for machine learning (ML) projects helps you automate and standardize the process of developing, testing, and maintaining your ML code and models. A robust CI pipeline ensures that your codebase remains reliable and that changes do not break essential project components. The typical stages in a CI pipeline for ML include:

  • Code linting: check code style and syntax to maintain readability and catch errors early;
  • Data validation: verify that input data meets expected formats, ranges, and quality standards;
  • Model training: retrain or update models using validated data and track performance metrics;
  • Test execution: run automated tests to confirm that both code and models function as intended.

These stages work together to create a feedback loop that quickly alerts you to any issues, supporting rapid and reliable development.

123456789101112131415161718192021222324252627282930313233343536373839404142
import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error # Data validation step def validate_data(df): # Check for missing values if df.isnull().values.any(): raise ValueError("Data contains missing values.") # Check for expected columns expected_columns = {'feature1', 'feature2', 'target'} if not expected_columns.issubset(df.columns): raise ValueError("Missing required columns.") # Check for valid ranges if not ((df['feature1'] >= 0).all() and (df['feature2'] >= 0).all()): raise ValueError("Features must be non-negative.") # Load sample data data = { 'feature1': [1, 2, 3, 4, 5], 'feature2': [2, 4, 6, 8, 10], 'target': [3, 6, 9, 12, 15] } df = pd.DataFrame(data) try: validate_data(df) print("Data validation passed.") except ValueError as e: print(f"Data validation failed: {e}") exit(1) # Model training step X = df[['feature1', 'feature2']] y = df['target'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) model = LinearRegression() model.fit(X_train, y_train) predictions = model.predict(X_test) mse = mean_squared_error(y_test, predictions) print(f"Model trained. Test MSE: {mse:.2f}")
copy
question mark

What is the main purpose of including data validation in a CI pipeline for ML?

Selecteer het correcte antwoord

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 4

Vraag AI

expand

Vraag AI

ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Sectie 1. Hoofdstuk 4
some-alt