Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Exploring Data with Descriptive Statistics | Data Collection and Cleaning for Journalists
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Python for Journalists and Media

bookExploring Data with Descriptive Statistics

Descriptive statistics are essential tools for journalists who want to summarize and communicate key aspects of their datasets. When dealing with collections of articles, such as word counts or publication dates, statistics like the mean, median, and mode help you quickly understand the central tendencies and patterns within your data. The mean provides the average value, the median identifies the midpoint, and the mode highlights the most frequently occurring value. These measures are important in reporting because they allow you to describe trends, compare sources, and spot outliers or shifts in reporting style.

123456789101112131415161718192021222324
import pandas as pd # Example dataset: word counts of recent articles data = { "title": [ "City Council Approves Budget", "Local School Wins Award", "Mayor Launches New Initiative", "Community Garden Flourishes", "Sports Team Advances to Finals" ], "word_count": [850, 400, 1200, 650, 950] } df = pd.DataFrame(data) # Calculate descriptive statistics mean_word_count = df["word_count"].mean() median_word_count = df["word_count"].median() mode_word_count = df["word_count"].mode()[0] print("Mean word count:", mean_word_count) print("Median word count:", median_word_count) print("Mode word count:", mode_word_count)
copy

By applying these calculations to a dataset of article word counts, you can quickly summarize the typical length of articles, which may reflect editorial standards or audience preferences. For instance, if the mean and median are close, it suggests most articles are similarly sized. If the mode differs significantly, it might indicate a common template or repeated format, such as brief updates or long-form features. Journalists can use these insights to compare reporting styles across outlets or time periods, inform editorial decisions, or highlight notable changes in coverage.

12345678
import matplotlib.pyplot as plt # Visualize the distribution of article word counts plt.hist(df["word_count"], bins=5, edgecolor="black") plt.title("Distribution of Article Word Counts") plt.xlabel("Word Count") plt.ylabel("Number of Articles") plt.show()
copy

1. What does the mean value represent in a dataset?

2. Why might a journalist want to visualize the distribution of article lengths?

3. Fill in the blank: To plot a histogram in matplotlib, use _ _ _.

question mark

What does the mean value represent in a dataset?

Select the correct answer

question mark

Why might a journalist want to visualize the distribution of article lengths?

Select the correct answer

question-icon

Fill in the blank: To plot a histogram in matplotlib, use _ _ _.

Click or drag`n`drop items and fill in the blanks

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 1. Kapitel 6

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Suggested prompts:

Can you explain how to interpret the histogram?

What other visualizations could help analyze this data?

How can I compare word counts across different categories or time periods?

bookExploring Data with Descriptive Statistics

Stryg for at vise menuen

Descriptive statistics are essential tools for journalists who want to summarize and communicate key aspects of their datasets. When dealing with collections of articles, such as word counts or publication dates, statistics like the mean, median, and mode help you quickly understand the central tendencies and patterns within your data. The mean provides the average value, the median identifies the midpoint, and the mode highlights the most frequently occurring value. These measures are important in reporting because they allow you to describe trends, compare sources, and spot outliers or shifts in reporting style.

123456789101112131415161718192021222324
import pandas as pd # Example dataset: word counts of recent articles data = { "title": [ "City Council Approves Budget", "Local School Wins Award", "Mayor Launches New Initiative", "Community Garden Flourishes", "Sports Team Advances to Finals" ], "word_count": [850, 400, 1200, 650, 950] } df = pd.DataFrame(data) # Calculate descriptive statistics mean_word_count = df["word_count"].mean() median_word_count = df["word_count"].median() mode_word_count = df["word_count"].mode()[0] print("Mean word count:", mean_word_count) print("Median word count:", median_word_count) print("Mode word count:", mode_word_count)
copy

By applying these calculations to a dataset of article word counts, you can quickly summarize the typical length of articles, which may reflect editorial standards or audience preferences. For instance, if the mean and median are close, it suggests most articles are similarly sized. If the mode differs significantly, it might indicate a common template or repeated format, such as brief updates or long-form features. Journalists can use these insights to compare reporting styles across outlets or time periods, inform editorial decisions, or highlight notable changes in coverage.

12345678
import matplotlib.pyplot as plt # Visualize the distribution of article word counts plt.hist(df["word_count"], bins=5, edgecolor="black") plt.title("Distribution of Article Word Counts") plt.xlabel("Word Count") plt.ylabel("Number of Articles") plt.show()
copy

1. What does the mean value represent in a dataset?

2. Why might a journalist want to visualize the distribution of article lengths?

3. Fill in the blank: To plot a histogram in matplotlib, use _ _ _.

question mark

What does the mean value represent in a dataset?

Select the correct answer

question mark

Why might a journalist want to visualize the distribution of article lengths?

Select the correct answer

question-icon

Fill in the blank: To plot a histogram in matplotlib, use _ _ _.

Click or drag`n`drop items and fill in the blanks

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 1. Kapitel 6
some-alt