Leer Building an ML CI Pipeline

Veeg om het menu te tonen

A continuous integration (CI) pipeline for machine learning (ML) projects helps you automate and standardize the process of developing, testing, and maintaining your ML code and models. A robust CI pipeline ensures that your codebase remains reliable and that changes do not break essential project components. The typical stages in a CI pipeline for ML include:

Code linting: check code style and syntax to maintain readability and catch errors early;
Data validation: verify that input data meets expected formats, ranges, and quality standards;
Model training: retrain or update models using validated data and track performance metrics;
Test execution: run automated tests to confirm that both code and models function as intended.

These stages work together to create a feedback loop that quickly alerts you to any issues, supporting rapid and reliable development.


              123456789101112131415161718192021222324252627282930313233343536373839404142
            
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Data validation step
def validate_data(df):
    # Check for missing values
    if df.isnull().values.any():
        raise ValueError("Data contains missing values.")
    # Check for expected columns
    expected_columns = {'feature1', 'feature2', 'target'}
    if not expected_columns.issubset(df.columns):
        raise ValueError("Missing required columns.")
    # Check for valid ranges
    if not ((df['feature1'] >= 0).all() and (df['feature2'] >= 0).all()):
        raise ValueError("Features must be non-negative.")

# Load sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [2, 4, 6, 8, 10],
    'target': [3, 6, 9, 12, 15]
}
df = pd.DataFrame(data)

try:
    validate_data(df)
    print("Data validation passed.")
except ValueError as e:
    print(f"Data validation failed: {e}")
    exit(1)

# Model training step
X = df[['feature1', 'feature2']]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print(f"Model trained. Test MSE: {mse:.2f}")

Was alles duidelijk?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 4

Vraag AI

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Sectie 1. Hoofdstuk 4