Lära Visualizing Statistical Results | Statistical Analysis in Environmental Science

Svep för att visa menyn

Visualizing statistical results is crucial in environmental science because it allows you to communicate complex data findings in a clear and accessible way. For instance, when comparing pollutant levels at different monitoring sites, visual tools like boxplots help summarize distributions and reveal differences that may be hidden in tables or simple summary statistics. Boxplots are especially effective for displaying the spread and central tendency of pollutant concentrations, making it easier to compare air quality between locations and identify unusual readings that merit further investigation.


              12345678910111213141516
            
import matplotlib.pyplot as plt
import pandas as pd

# Example pollutant concentration data (µg/m³) for two sites
data = {
    "Site A": [12, 15, 14, 13, 16, 18, 20, 14, 15, 16],
    "Site B": [22, 25, 19, 21, 24, 23, 28, 22, 27, 25]
}

df = pd.DataFrame(data)

plt.figure(figsize=(8, 5))
plt.boxplot([df["Site A"], df["Site B"]], labels=["Site A", "Site B"])
plt.ylabel("PM2.5 Concentration (µg/m³)")
plt.title("Comparison of PM2.5 Levels at Two Monitoring Sites")
plt.show()

When you look at a boxplot, you will see a rectangular box that represents the interquartile range (IQR), which includes the middle 50% of your data. The line inside the box shows the median value, giving you a sense of the typical pollutant concentration at each site. The whiskers extending from the box indicate the range of most of the remaining data, while points outside the whiskers are plotted individually as outliers. These features help you quickly spot differences in central tendency, variability, and the presence of extreme values between sites. For environmental data, this means you can easily see which site has higher or more variable pollution, and whether there are unusual pollution spikes that could signal specific events or measurement issues.


              1234567891011121314151617181920212223
            
import numpy as np

plt.figure(figsize=(8, 5))
box = plt.boxplot([df["Site A"], df["Site B"]], labels=["Site A", "Site B"], patch_artist=True)
plt.ylabel("PM2.5 Concentration (µg/m³)")
plt.title("Annotated PM2.5 Boxplot")

# Annotate median values
medians = [np.median(df["Site A"]), np.median(df["Site B"])]
for i, median in enumerate(medians, start=1):
    plt.text(i, median + 0.5, f"Median: {median}", ha="center", color="blue")

# Highlight significant difference
plt.annotate(
    "Higher median at Site B",
    xy=(2, medians[1]),
    xytext=(2, medians[1] + 4),
    arrowprops=dict(facecolor="red", shrink=0.05),
    ha="center",
    color="red"
)

plt.show()

1. What does the box in a boxplot represent?

2. How can outliers be identified in a boxplot?

3. Fill in the blank: To create a boxplot of 'PM2.5' for two sites, use plt.boxplot([site1, ____]).

Var allt tydligt?

Tack för dina kommentarer!

Avsnitt 2. Kapitel 6

Fråga AI

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Avsnitt 2. Kapitel 6