Leer Descriptive Statistics Fundamentals

Veeg om het menu te tonen

Descriptive statistics are essential tools for summarizing and understanding the main features of a dataset. They help you quickly grasp the central tendency, variability, and overall distribution of your data. The most common measures include mean, median, mode, variance, and standard deviation.

The mean is the arithmetic average of a dataset. It is sensitive to extreme values (outliers) and gives you a sense of the "center" of your data;
The median is the middle value when the data are sorted. It is robust to outliers and provides a better measure of central tendency when your data are skewed;
The mode is the most frequently occurring value in your dataset. It is useful for categorical data or when you want to identify the most common value;
Variance measures how spread out the numbers in your dataset are. It calculates the average squared deviation from the mean;
Standard deviation is the square root of the variance and gives you a sense of how much the data typically deviates from the mean, using the same units as the data itself.

In Python, you can calculate these statistics using both built-in functions and libraries like numpy. The built-in sum() and len() functions can help you compute the mean, while numpy provides more efficient and feature-rich methods for all these measures. Understanding these statistics helps you interpret your data, spot unusual values, and choose the right approach for further analysis.


              123456789101112131415161718192021222324252627282930313233343536373839
            
import numpy as np
from statistics import mean, median, mode, variance, stdev

# Sample dataset
data = [12, 15, 12, 18, 19, 12, 17, 21, 22, 15]

# Mean
mean_value = mean(data)
print("Mean:", mean_value)

# Median
median_value = median(data)
print("Median:", median_value)

# Mode
mode_value = mode(data)
print("Mode:", mode_value)

# Variance
variance_value = variance(data)
print("Variance:", variance_value)

# Standard deviation
stdev_value = stdev(data)
print("Standard Deviation:", stdev_value)

# Alternatively, using numpy
np_mean = np.mean(data)
np_median = np.median(data)
np_mode = np.argmax(np.bincount(data))
np_variance = np.var(data, ddof=1)  # ddof=1 for sample variance
np_stdev = np.std(data, ddof=1)     # ddof=1 for sample std

print("\nUsing numpy:")
print("Mean:", np_mean)
print("Median:", np_median)
print("Mode:", np_mode)
print("Variance:", np_variance)
print("Standard Deviation:", np_stdev)

Was alles duidelijk?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 1

Vraag AI

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Sectie 1. Hoofdstuk 1