Lære Visualizing Statistical Results | Statistical Analysis in Environmental Science

Sveip for å vise menyen

Visualizing statistical results is crucial in environmental science because it allows you to communicate complex data findings in a clear and accessible way. For instance, when comparing pollutant levels at different monitoring sites, visual tools like boxplots help summarize distributions and reveal differences that may be hidden in tables or simple summary statistics. Boxplots are especially effective for displaying the spread and central tendency of pollutant concentrations, making it easier to compare air quality between locations and identify unusual readings that merit further investigation.


              12345678910111213141516
            
import matplotlib.pyplot as plt
import pandas as pd

# Example pollutant concentration data (µg/m³) for two sites
data = {
    "Site A": [12, 15, 14, 13, 16, 18, 20, 14, 15, 16],
    "Site B": [22, 25, 19, 21, 24, 23, 28, 22, 27, 25]
}

df = pd.DataFrame(data)

plt.figure(figsize=(8, 5))
plt.boxplot([df["Site A"], df["Site B"]], labels=["Site A", "Site B"])
plt.ylabel("PM2.5 Concentration (µg/m³)")
plt.title("Comparison of PM2.5 Levels at Two Monitoring Sites")
plt.show()

When you look at a boxplot, you will see a rectangular box that represents the interquartile range (IQR), which includes the middle 50% of your data. The line inside the box shows the median value, giving you a sense of the typical pollutant concentration at each site. The whiskers extending from the box indicate the range of most of the remaining data, while points outside the whiskers are plotted individually as outliers. These features help you quickly spot differences in central tendency, variability, and the presence of extreme values between sites. For environmental data, this means you can easily see which site has higher or more variable pollution, and whether there are unusual pollution spikes that could signal specific events or measurement issues.


              1234567891011121314151617181920212223
            
import numpy as np

plt.figure(figsize=(8, 5))
box = plt.boxplot([df["Site A"], df["Site B"]], labels=["Site A", "Site B"], patch_artist=True)
plt.ylabel("PM2.5 Concentration (µg/m³)")
plt.title("Annotated PM2.5 Boxplot")

# Annotate median values
medians = [np.median(df["Site A"]), np.median(df["Site B"])]
for i, median in enumerate(medians, start=1):
    plt.text(i, median + 0.5, f"Median: {median}", ha="center", color="blue")

# Highlight significant difference
plt.annotate(
    "Higher median at Site B",
    xy=(2, medians[1]),
    xytext=(2, medians[1] + 4),
    arrowprops=dict(facecolor="red", shrink=0.05),
    ha="center",
    color="red"
)

plt.show()

1. What does the box in a boxplot represent?

2. How can outliers be identified in a boxplot?

3. Fill in the blank: To create a boxplot of 'PM2.5' for two sites, use plt.boxplot([site1, ____]).

Alt var klart?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 6

Spør AI

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 2. Kapittel 6