Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Descriptive Statistics for Environmental Data | Statistical Analysis in Environmental Science
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Python for Environmental Science

bookDescriptive Statistics for Environmental Data

Descriptive statistics provide a foundation for understanding the main characteristics of environmental datasets. In environmental science, you often work with variables such as pollutant concentrations, temperature, or rainfall, which are measured repeatedly over time or across different locations. Key descriptive statistics include the mean (average value), median (middle value when sorted), mode (most frequently occurring value), and standard deviation (a measure of how spread out the values are). These statistics help you quickly summarize the central tendency and variability of data, which is essential for monitoring environmental quality, detecting unusual events, and informing policy decisions.

1234567891011121314151617181920
import pandas as pd # Example pollutant concentration data (in micrograms per cubic meter) data = { "PM2.5": [12, 15, 14, 16, 18, 120, 13, 15, 14, 13], "NO2": [22, 21, 19, 24, 23, 22, 20, 100, 21, 22] } df = pd.DataFrame(data) # Calculate descriptive statistics mean_pm25 = df["PM2.5"].mean() median_pm25 = df["PM2.5"].median() mode_pm25 = df["PM2.5"].mode()[0] std_pm25 = df["PM2.5"].std() print("PM2.5 Mean:", mean_pm25) print("PM2.5 Median:", median_pm25) print("PM2.5 Mode:", mode_pm25) print("PM2.5 Standard Deviation:", std_pm25)
copy

Looking at the calculated statistics for the PM2.5 pollutant, you can see how each value describes a different aspect of the data. The mean gives the average concentration, which is helpful for understanding the typical level of pollution. The median is less affected by extreme values, so it often represents the "typical" value more accurately when outliers are present. The mode can highlight the most common pollution level if certain readings occur more frequently. The standard deviation indicates how much the pollution levels vary from the mean; a high standard deviation suggests that there are large fluctuations or outliers in the dataset, which could signal occasional pollution spikes or measurement errors.

12345678
# Identifying outliers in PM2.5 using standard deviation mean = df["PM2.5"].mean() std = df["PM2.5"].std() # Outliers are values more than 2 standard deviations from the mean outliers = df[(df["PM2.5"] > mean + 2*std) | (df["PM2.5"] < mean - 2*std)] print("Outliers in PM2.5:") print(outliers)
copy

1. What does the standard deviation tell you about an environmental dataset?

2. Which pandas method provides a summary of descriptive statistics for a DataFrame?

3. Fill in the blank: To find the median of a column PM2.5 in df, use df['PM2.5'].____().

question mark

What does the standard deviation tell you about an environmental dataset?

Select the correct answer

question mark

Which pandas method provides a summary of descriptive statistics for a DataFrame?

Select the correct answer

question-icon

Fill in the blank: To find the median of a column PM2.5 in df, use df['PM2.5'].____().

No output for this fill-in-the-blank question.
War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 2. Kapitel 1

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Suggested prompts:

Can you explain why 120 is considered an outlier in this dataset?

How would removing the outlier affect the descriptive statistics?

Can you show how to identify outliers for the NO2 column as well?

bookDescriptive Statistics for Environmental Data

Swipe um das Menü anzuzeigen

Descriptive statistics provide a foundation for understanding the main characteristics of environmental datasets. In environmental science, you often work with variables such as pollutant concentrations, temperature, or rainfall, which are measured repeatedly over time or across different locations. Key descriptive statistics include the mean (average value), median (middle value when sorted), mode (most frequently occurring value), and standard deviation (a measure of how spread out the values are). These statistics help you quickly summarize the central tendency and variability of data, which is essential for monitoring environmental quality, detecting unusual events, and informing policy decisions.

1234567891011121314151617181920
import pandas as pd # Example pollutant concentration data (in micrograms per cubic meter) data = { "PM2.5": [12, 15, 14, 16, 18, 120, 13, 15, 14, 13], "NO2": [22, 21, 19, 24, 23, 22, 20, 100, 21, 22] } df = pd.DataFrame(data) # Calculate descriptive statistics mean_pm25 = df["PM2.5"].mean() median_pm25 = df["PM2.5"].median() mode_pm25 = df["PM2.5"].mode()[0] std_pm25 = df["PM2.5"].std() print("PM2.5 Mean:", mean_pm25) print("PM2.5 Median:", median_pm25) print("PM2.5 Mode:", mode_pm25) print("PM2.5 Standard Deviation:", std_pm25)
copy

Looking at the calculated statistics for the PM2.5 pollutant, you can see how each value describes a different aspect of the data. The mean gives the average concentration, which is helpful for understanding the typical level of pollution. The median is less affected by extreme values, so it often represents the "typical" value more accurately when outliers are present. The mode can highlight the most common pollution level if certain readings occur more frequently. The standard deviation indicates how much the pollution levels vary from the mean; a high standard deviation suggests that there are large fluctuations or outliers in the dataset, which could signal occasional pollution spikes or measurement errors.

12345678
# Identifying outliers in PM2.5 using standard deviation mean = df["PM2.5"].mean() std = df["PM2.5"].std() # Outliers are values more than 2 standard deviations from the mean outliers = df[(df["PM2.5"] > mean + 2*std) | (df["PM2.5"] < mean - 2*std)] print("Outliers in PM2.5:") print(outliers)
copy

1. What does the standard deviation tell you about an environmental dataset?

2. Which pandas method provides a summary of descriptive statistics for a DataFrame?

3. Fill in the blank: To find the median of a column PM2.5 in df, use df['PM2.5'].____().

question mark

What does the standard deviation tell you about an environmental dataset?

Select the correct answer

question mark

Which pandas method provides a summary of descriptive statistics for a DataFrame?

Select the correct answer

question-icon

Fill in the blank: To find the median of a column PM2.5 in df, use df['PM2.5'].____().

No output for this fill-in-the-blank question.
War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 2. Kapitel 1
some-alt