Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Challenge: Detecting Outliers Using 3-Sigma Rule | Additional Statements From The Probability Theory
Advanced Probability Theory

book
Challenge: Detecting Outliers Using 3-Sigma Rule

Oppgave

Swipe to start coding

In the previous chapter, we mentioned that we can find outliers for normally distributed random variables using the 3-sigma rule. In the general case, we will consider all those values ​​outside the 3-sigma range as outliers.
Your task is to find outliers on a specific dataset. You have to assume that the given samples have a Gaussian distribution with a mean of 0 and a standard deviation of 4. Your task is to:

  1. Specify mean equals 0.
  2. Specify std equals 4.
  3. Specify criteria for outliers detection due to the 3-sigma rule.

Note

We have to admit that in real-life tasks, we cannot unreasonably say that the data has a Gaussian distribution and a certain mean and standard deviation. For this, various statistical tests are carried out. This will be discussed in more detail in the next chapters.

Løsning

import pandas as pd
import matplotlib.pyplot as plt

samples = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/Advanced+Probability+course+media/gaussian_samples.csv', names = ['Value'])

mean = 0
std = 4
# Identify outliers using 3-sigma rule
outliers = samples[(samples['Value'] > mean + 3*std) | (samples['Value'] < mean - 3*std)]
print('Outliers are: \n', outliers)
Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 7
import pandas as pd
import matplotlib.pyplot as plt
# Load data
samples = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/Advanced+Probability+course+media/gaussian_samples.csv', names = ['Value'])
# Specify mean and std
mean = ___
std = ___
# Identify outliers using 3-sigma rule
outliers = samples[(samples['Value'] > mean + 3*___) | (samples['Value'] < ___ - 3*std)]
# Print outliers
print('Outliers are: \n', outliers)
toggle bottom row
some-alt