Learn Introduction to Data in Journalism | Data Collection and Cleaning for Journalists

Python for Journalists and Media

Swipe to show menu

Prerequisites

Data-driven journalism is transforming how newsrooms investigate, report, and present stories. By leveraging data, you can uncover trends, highlight disparities, and provide evidence-based insights that go beyond anecdotes or isolated events. Impactful reporting such as uncovering government spending irregularities, analyzing election results, or tracking the spread of diseases often relies on analyzing large datasets. Major news outlets have used data to reveal systemic issues in criminal justice, health care, and climate change by presenting interactive graphics and in-depth analysis.

Python has become an essential tool in the modern newsroom. Its versatility allows you to automate data collection, clean messy information, and analyze patterns efficiently. With Python, even those without a computer science background can quickly learn to handle data, making it possible to break complex stories and support investigative reporting with solid evidence.


              123456789101112131415161718
            
# Define a list of dictionaries representing news headlines
headlines = [
    {"title": "Election Results Announced in Major Cities"},
    {"title": "Local Schools Receive New Funding"},
    {"title": "Wildfire Threatens Rural Communities"},
    {"title": "Election Debate Draws Record Viewership"},
    {"title": "Scientists Discover New Species"},
    {"title": "City Council Approves Affordable Housing Plan"},
    {"title": "Election Officials Prepare for High Turnout"},
    {"title": "Sports Team Wins Championship"},
    {"title": "Community Rallies for Flood Relief"},
    {"title": "Election Polls Show Tight Race"}
]

# Print the first five headlines
for headline in headlines[:5]:
    print(headline["title"])

By working with in-memory data structures like lists and dictionaries, you can quickly access, filter, and review large amounts of information. In the example above, the headlines list allows you to store multiple news items and instantly retrieve or process specific entries. This ability to handle data efficiently is especially valuable when deadlines are tight and stories are developing rapidly.


              123456789
            
# Count the number of headlines containing the word 'election'
keyword = "election"
count = 0

for headline in headlines:
    if keyword.lower() in headline["title"].lower():
        count += 1

print(f"Number of headlines containing '{keyword}':", count)

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 1

Introduction to Data in Journalism

1. What is one benefit of using Python for data-driven journalism?

2. Which Python library is commonly used for working with tabular data in memory?

3. Why is it important for journalists to be able to process large datasets efficiently?