Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Exploring Data with Descriptive Statistics | Data Collection and Cleaning for Journalists
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Python for Journalists and Media

bookExploring Data with Descriptive Statistics

Descriptive statistics are essential tools for journalists who want to summarize and communicate key aspects of their datasets. When dealing with collections of articles, such as word counts or publication dates, statistics like the mean, median, and mode help you quickly understand the central tendencies and patterns within your data. The mean provides the average value, the median identifies the midpoint, and the mode highlights the most frequently occurring value. These measures are important in reporting because they allow you to describe trends, compare sources, and spot outliers or shifts in reporting style.

123456789101112131415161718192021222324
import pandas as pd # Example dataset: word counts of recent articles data = { "title": [ "City Council Approves Budget", "Local School Wins Award", "Mayor Launches New Initiative", "Community Garden Flourishes", "Sports Team Advances to Finals" ], "word_count": [850, 400, 1200, 650, 950] } df = pd.DataFrame(data) # Calculate descriptive statistics mean_word_count = df["word_count"].mean() median_word_count = df["word_count"].median() mode_word_count = df["word_count"].mode()[0] print("Mean word count:", mean_word_count) print("Median word count:", median_word_count) print("Mode word count:", mode_word_count)
copy

By applying these calculations to a dataset of article word counts, you can quickly summarize the typical length of articles, which may reflect editorial standards or audience preferences. For instance, if the mean and median are close, it suggests most articles are similarly sized. If the mode differs significantly, it might indicate a common template or repeated format, such as brief updates or long-form features. Journalists can use these insights to compare reporting styles across outlets or time periods, inform editorial decisions, or highlight notable changes in coverage.

12345678
import matplotlib.pyplot as plt # Visualize the distribution of article word counts plt.hist(df["word_count"], bins=5, edgecolor="black") plt.title("Distribution of Article Word Counts") plt.xlabel("Word Count") plt.ylabel("Number of Articles") plt.show()
copy

1. What does the mean value represent in a dataset?

2. Why might a journalist want to visualize the distribution of article lengths?

3. Fill in the blank: To plot a histogram in matplotlib, use _ _ _.

question mark

What does the mean value represent in a dataset?

Select the correct answer

question mark

Why might a journalist want to visualize the distribution of article lengths?

Select the correct answer

question-icon

Fill in the blank: To plot a histogram in matplotlib, use _ _ _.

Click or drag`n`drop items and fill in the blanks

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 6

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Suggested prompts:

Can you explain how to interpret the histogram?

What other visualizations could help analyze this data?

How can I compare word counts across different categories or time periods?

bookExploring Data with Descriptive Statistics

Swipe um das Menü anzuzeigen

Descriptive statistics are essential tools for journalists who want to summarize and communicate key aspects of their datasets. When dealing with collections of articles, such as word counts or publication dates, statistics like the mean, median, and mode help you quickly understand the central tendencies and patterns within your data. The mean provides the average value, the median identifies the midpoint, and the mode highlights the most frequently occurring value. These measures are important in reporting because they allow you to describe trends, compare sources, and spot outliers or shifts in reporting style.

123456789101112131415161718192021222324
import pandas as pd # Example dataset: word counts of recent articles data = { "title": [ "City Council Approves Budget", "Local School Wins Award", "Mayor Launches New Initiative", "Community Garden Flourishes", "Sports Team Advances to Finals" ], "word_count": [850, 400, 1200, 650, 950] } df = pd.DataFrame(data) # Calculate descriptive statistics mean_word_count = df["word_count"].mean() median_word_count = df["word_count"].median() mode_word_count = df["word_count"].mode()[0] print("Mean word count:", mean_word_count) print("Median word count:", median_word_count) print("Mode word count:", mode_word_count)
copy

By applying these calculations to a dataset of article word counts, you can quickly summarize the typical length of articles, which may reflect editorial standards or audience preferences. For instance, if the mean and median are close, it suggests most articles are similarly sized. If the mode differs significantly, it might indicate a common template or repeated format, such as brief updates or long-form features. Journalists can use these insights to compare reporting styles across outlets or time periods, inform editorial decisions, or highlight notable changes in coverage.

12345678
import matplotlib.pyplot as plt # Visualize the distribution of article word counts plt.hist(df["word_count"], bins=5, edgecolor="black") plt.title("Distribution of Article Word Counts") plt.xlabel("Word Count") plt.ylabel("Number of Articles") plt.show()
copy

1. What does the mean value represent in a dataset?

2. Why might a journalist want to visualize the distribution of article lengths?

3. Fill in the blank: To plot a histogram in matplotlib, use _ _ _.

question mark

What does the mean value represent in a dataset?

Select the correct answer

question mark

Why might a journalist want to visualize the distribution of article lengths?

Select the correct answer

question-icon

Fill in the blank: To plot a histogram in matplotlib, use _ _ _.

Click or drag`n`drop items and fill in the blanks

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 6
some-alt