Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Challenge: Outlier Detection Using MAD Rule | Statistical Methods in Anomaly Detection
Data Anomaly Detection

book
Challenge: Outlier Detection Using MAD Rule

Завдання

Swipe to start coding

Now, you will use the MAD rule to detect outliers in the California Housing Dataset. It contains various features related to housing characteristics in different districts in California.

In this task, we will detect outliers in the column MedInc, which stands for Median Income.

Your task is to:

  1. Fill in all gaps in mad() function to calculate Mean Absolute Deviation.
  2. Calculate the threshold using value 3 as a threshold value.
  3. Specify the rule to detect outliers that will be stored in the outliers variable.

Рішення

import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing

# Load the California Housing dataset
california_housing = fetch_california_housing()

# Convert the dataset to a pandas DataFrame
data = pd.DataFrame(california_housing.data, columns=california_housing.feature_names)

# Select a specific feature for illustration (e.g., 'housing_median_age')
selected_feature = 'MedInc'
feature_data = data[selected_feature]

# Function to calculate MAD
def mad(data):
median = np.median(data)
absolute_deviations = np.abs(data - median)
mad_value = np.median(absolute_deviations)
return mad_value

# Set the threshold (e.g., 3 times the MAD)
threshold = 3 * mad(feature_data)

# Detect outliers using the MAD rule
outliers = feature_data[np.abs(feature_data - np.median(feature_data)) > threshold]

# Print the detected outliers
print("Detected outliers in feature '{}':".format(selected_feature))
print(outliers)

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 6
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing

# Load the California Housing dataset
california_housing = fetch_california_housing()

# Convert the dataset to a pandas DataFrame
data = pd.DataFrame(california_housing.data, columns=california_housing.feature_names)

# Select a specific feature for illustration (e.g., 'housing_median_age')
selected_feature = 'MedInc'
feature_data = data[selected_feature]

# Function to calculate MAD
def mad(data):
median = np.___(data)
absolute_deviations = np.___(data - median)
mad_value = np.median(absolute_deviations)
return mad_value

# Set the threshold (e.g., 3 times the MAD)
threshold = ___ * mad(feature_data)

# Detect outliers using the MAD rule
outliers = feature_data[np.abs(feature_data - np.median(feature_data)) ___ threshold]

# Print the detected outliers
print("Detected outliers in feature '{}':".format(selected_feature))
print(outliers)
toggle bottom row
some-alt