Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Exploring Data with Descriptive Statistics | Data Collection and Cleaning for Journalists
Python for Journalists and Media

bookExploring Data with Descriptive Statistics

Descriptive statistics are essential tools for journalists who want to summarize and communicate key aspects of their datasets. When dealing with collections of articles, such as word counts or publication dates, statistics like the mean, median, and mode help you quickly understand the central tendencies and patterns within your data. The mean provides the average value, the median identifies the midpoint, and the mode highlights the most frequently occurring value. These measures are important in reporting because they allow you to describe trends, compare sources, and spot outliers or shifts in reporting style.

123456789101112131415161718192021222324
import pandas as pd # Example dataset: word counts of recent articles data = { "title": [ "City Council Approves Budget", "Local School Wins Award", "Mayor Launches New Initiative", "Community Garden Flourishes", "Sports Team Advances to Finals" ], "word_count": [850, 400, 1200, 650, 950] } df = pd.DataFrame(data) # Calculate descriptive statistics mean_word_count = df["word_count"].mean() median_word_count = df["word_count"].median() mode_word_count = df["word_count"].mode()[0] print("Mean word count:", mean_word_count) print("Median word count:", median_word_count) print("Mode word count:", mode_word_count)
copy

By applying these calculations to a dataset of article word counts, you can quickly summarize the typical length of articles, which may reflect editorial standards or audience preferences. For instance, if the mean and median are close, it suggests most articles are similarly sized. If the mode differs significantly, it might indicate a common template or repeated format, such as brief updates or long-form features. Journalists can use these insights to compare reporting styles across outlets or time periods, inform editorial decisions, or highlight notable changes in coverage.

12345678
import matplotlib.pyplot as plt # Visualize the distribution of article word counts plt.hist(df["word_count"], bins=5, edgecolor="black") plt.title("Distribution of Article Word Counts") plt.xlabel("Word Count") plt.ylabel("Number of Articles") plt.show()
copy

1. What does the mean value represent in a dataset?

2. Why might a journalist want to visualize the distribution of article lengths?

3. Fill in the blank: To plot a histogram in matplotlib, use _ _ _.

question mark

What does the mean value represent in a dataset?

Select the correct answer

question mark

Why might a journalist want to visualize the distribution of article lengths?

Select the correct answer

question-icon

Fill in the blank: To plot a histogram in matplotlib, use _ _ _.

Click or drag`n`drop items and fill in the blanks

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 6

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Suggested prompts:

Can you explain how to interpret the histogram?

What other visualizations could help analyze this data?

How can I compare word counts across different categories or time periods?

bookExploring Data with Descriptive Statistics

Sveip for å vise menyen

Descriptive statistics are essential tools for journalists who want to summarize and communicate key aspects of their datasets. When dealing with collections of articles, such as word counts or publication dates, statistics like the mean, median, and mode help you quickly understand the central tendencies and patterns within your data. The mean provides the average value, the median identifies the midpoint, and the mode highlights the most frequently occurring value. These measures are important in reporting because they allow you to describe trends, compare sources, and spot outliers or shifts in reporting style.

123456789101112131415161718192021222324
import pandas as pd # Example dataset: word counts of recent articles data = { "title": [ "City Council Approves Budget", "Local School Wins Award", "Mayor Launches New Initiative", "Community Garden Flourishes", "Sports Team Advances to Finals" ], "word_count": [850, 400, 1200, 650, 950] } df = pd.DataFrame(data) # Calculate descriptive statistics mean_word_count = df["word_count"].mean() median_word_count = df["word_count"].median() mode_word_count = df["word_count"].mode()[0] print("Mean word count:", mean_word_count) print("Median word count:", median_word_count) print("Mode word count:", mode_word_count)
copy

By applying these calculations to a dataset of article word counts, you can quickly summarize the typical length of articles, which may reflect editorial standards or audience preferences. For instance, if the mean and median are close, it suggests most articles are similarly sized. If the mode differs significantly, it might indicate a common template or repeated format, such as brief updates or long-form features. Journalists can use these insights to compare reporting styles across outlets or time periods, inform editorial decisions, or highlight notable changes in coverage.

12345678
import matplotlib.pyplot as plt # Visualize the distribution of article word counts plt.hist(df["word_count"], bins=5, edgecolor="black") plt.title("Distribution of Article Word Counts") plt.xlabel("Word Count") plt.ylabel("Number of Articles") plt.show()
copy

1. What does the mean value represent in a dataset?

2. Why might a journalist want to visualize the distribution of article lengths?

3. Fill in the blank: To plot a histogram in matplotlib, use _ _ _.

question mark

What does the mean value represent in a dataset?

Select the correct answer

question mark

Why might a journalist want to visualize the distribution of article lengths?

Select the correct answer

question-icon

Fill in the blank: To plot a histogram in matplotlib, use _ _ _.

Click or drag`n`drop items and fill in the blanks

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 6
some-alt