Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Descriptive Statistics in Policy Analysis | Statistical Analysis for Policy Evaluation
Python for Government Analysts

bookDescriptive Statistics in Policy Analysis

Descriptive statistics offer a foundation for understanding and communicating the patterns found in government data. Common measures—such as mean, median, mode, and standard deviation—help you summarize large datasets into clear, actionable insights. In policy analysis, these statistics are essential for comparing populations, identifying trends, and informing decisions. For example, when evaluating household income across a city, knowing the average (mean) income, the middle value (median), and how much incomes vary (standard deviation) provides a much clearer picture than any single data point.

The mean is the arithmetic average of a dataset and is widely used to describe central tendency. The median is the middle value when data are ordered and is especially useful when the dataset contains extreme values, or outliers. The mode represents the most frequently occurring value and can be helpful when data are categorical or when you want to identify the most common outcome. Standard deviation quantifies how much the values in a dataset deviate from the mean, offering insight into the spread or variability of the data. Each of these statistics plays a unique role in policy analysis, helping you understand not just what is typical, but also how much variation exists and where your focus should be.

12345678910111213141516
# Calculate mean and median household income from a hardcoded list household_incomes = [42000, 39000, 48000, 51000, 47000, 53000, 120000] # last value is an outlier # Mean (average) mean_income = sum(household_incomes) / len(household_incomes) # Median sorted_incomes = sorted(household_incomes) n = len(sorted_incomes) if n % 2 == 1: median_income = sorted_incomes[n // 2] else: median_income = (sorted_incomes[n // 2 - 1] + sorted_incomes[n // 2]) / 2 print("Mean household income:", mean_income) print("Median household income:", median_income)
copy

When deciding which statistic to use, consider the shape and characteristics of your data. The mean is sensitive to outliers—unusually high or low values—which can distort your understanding of what is typical. For instance, a single extremely high income can raise the mean, making the overall population appear wealthier than it actually is. The median is more robust in these situations, as it is not influenced by outliers and often gives a more accurate sense of the "typical" value in skewed distributions. The mode is most useful when you want to know the most common category or value, such as the most frequently used public service.

Standard deviation measures how spread out the data are around the mean. A small standard deviation means most values are close to the mean, while a large standard deviation signals more variability. In the context of income, a high standard deviation suggests significant income inequality, whereas a low standard deviation points to a more equitable distribution.

123456789
import math # Using the same household_incomes list mean_income = sum(household_incomes) / len(household_incomes) # Calculate standard deviation variance = sum((x - mean_income) ** 2 for x in household_incomes) / len(household_incomes) std_deviation = math.sqrt(variance) print("Standard deviation of household income:", std_deviation)
copy

1. What does the standard deviation tell you about a dataset?

2. When is the median a better measure than the mean?

3. How can outliers affect the interpretation of average values?

question mark

What does the standard deviation tell you about a dataset?

Select the correct answer

question mark

When is the median a better measure than the mean?

Select the correct answer

question mark

How can outliers affect the interpretation of average values?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 2. Luku 1

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Suggested prompts:

Can you explain how outliers affect the mean and median in this example?

What does a high standard deviation tell us about income distribution?

Can you show how to calculate the mode for this dataset?

bookDescriptive Statistics in Policy Analysis

Pyyhkäise näyttääksesi valikon

Descriptive statistics offer a foundation for understanding and communicating the patterns found in government data. Common measures—such as mean, median, mode, and standard deviation—help you summarize large datasets into clear, actionable insights. In policy analysis, these statistics are essential for comparing populations, identifying trends, and informing decisions. For example, when evaluating household income across a city, knowing the average (mean) income, the middle value (median), and how much incomes vary (standard deviation) provides a much clearer picture than any single data point.

The mean is the arithmetic average of a dataset and is widely used to describe central tendency. The median is the middle value when data are ordered and is especially useful when the dataset contains extreme values, or outliers. The mode represents the most frequently occurring value and can be helpful when data are categorical or when you want to identify the most common outcome. Standard deviation quantifies how much the values in a dataset deviate from the mean, offering insight into the spread or variability of the data. Each of these statistics plays a unique role in policy analysis, helping you understand not just what is typical, but also how much variation exists and where your focus should be.

12345678910111213141516
# Calculate mean and median household income from a hardcoded list household_incomes = [42000, 39000, 48000, 51000, 47000, 53000, 120000] # last value is an outlier # Mean (average) mean_income = sum(household_incomes) / len(household_incomes) # Median sorted_incomes = sorted(household_incomes) n = len(sorted_incomes) if n % 2 == 1: median_income = sorted_incomes[n // 2] else: median_income = (sorted_incomes[n // 2 - 1] + sorted_incomes[n // 2]) / 2 print("Mean household income:", mean_income) print("Median household income:", median_income)
copy

When deciding which statistic to use, consider the shape and characteristics of your data. The mean is sensitive to outliers—unusually high or low values—which can distort your understanding of what is typical. For instance, a single extremely high income can raise the mean, making the overall population appear wealthier than it actually is. The median is more robust in these situations, as it is not influenced by outliers and often gives a more accurate sense of the "typical" value in skewed distributions. The mode is most useful when you want to know the most common category or value, such as the most frequently used public service.

Standard deviation measures how spread out the data are around the mean. A small standard deviation means most values are close to the mean, while a large standard deviation signals more variability. In the context of income, a high standard deviation suggests significant income inequality, whereas a low standard deviation points to a more equitable distribution.

123456789
import math # Using the same household_incomes list mean_income = sum(household_incomes) / len(household_incomes) # Calculate standard deviation variance = sum((x - mean_income) ** 2 for x in household_incomes) / len(household_incomes) std_deviation = math.sqrt(variance) print("Standard deviation of household income:", std_deviation)
copy

1. What does the standard deviation tell you about a dataset?

2. When is the median a better measure than the mean?

3. How can outliers affect the interpretation of average values?

question mark

What does the standard deviation tell you about a dataset?

Select the correct answer

question mark

When is the median a better measure than the mean?

Select the correct answer

question mark

How can outliers affect the interpretation of average values?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 2. Luku 1
some-alt