Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Working with Tabular Data | Data Collection and Cleaning for Journalists
Python for Journalists and Media

bookWorking with Tabular Data

Tabular data is at the heart of many impactful journalism stories. As a journalist, you often encounter structured tables when working with government data, Freedom of Information Act (FOIA) releases, public salary disclosures, or datasets from research organizations. Tables are a popular format because they organize information into rows and columns, making it easier to compare, analyze, and spot patterns or anomalies that could lead to newsworthy stories.

123456789101112131415
import pandas as pd # Create a DataFrame of public official salaries data = { "Name": ["Alex Kim", "Jordan Lee", "Morgan Patel", "Taylor Smith", "Casey Jones"], "Position": ["Mayor", "City Clerk", "Fire Chief", "Police Chief", "Treasurer"], "Salary": [120000, 75000, 98000, 105000, 87000] } salaries_df = pd.DataFrame(data) # Display summary statistics for the Salary column salary_stats = salaries_df["Salary"].describe() print("Salary Summary Statistics:") print(salary_stats)
copy

In the code above, you use the pandas library to create a DataFrame from a dictionary containing names, job positions, and salaries of public officials. The pd.DataFrame() function turns the dictionary into a structured table. Once your data is in a DataFrame, you can use the .describe() method to quickly generate summary statistics about the Salary column. This includes the count, mean, standard deviation, minimum, and maximum values, as well as quartiles. For journalists, these statistics are essential for spotting outliersβ€”such as an unusually high salaryβ€”or identifying overall trends, like the average pay for city officials. This rapid overview helps you decide where to dig deeper for your reporting.

1234
# Filter officials with salaries above $90,000 high_earners = salaries_df[salaries_df["Salary"] > 90000] print("Officials earning above $90,000:") print(high_earners)
copy

1. What function in pandas is used to create a DataFrame from a dictionary?

2. How can filtering data help journalists find newsworthy stories?

3. Fill in the blank: To display the first 10 rows of a DataFrame, use _____

question mark

What function in pandas is used to create a DataFrame from a dictionary?

Select the correct answer

question mark

How can filtering data help journalists find newsworthy stories?

Select the correct answer

question-icon

Fill in the blank: To display the first 10 rows of a DataFrame, use _____

salaries_df
Name Position Salary
0 Alex Kim Mayor 120000
1 Jordan Lee City Clerk 75000
2 Morgan Patel Fire Chief 98000
3 Taylor Smith Police Chief 105000
4 Casey Jones Treasurer 87000
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 2

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

bookWorking with Tabular Data

Swipe to show menu

Tabular data is at the heart of many impactful journalism stories. As a journalist, you often encounter structured tables when working with government data, Freedom of Information Act (FOIA) releases, public salary disclosures, or datasets from research organizations. Tables are a popular format because they organize information into rows and columns, making it easier to compare, analyze, and spot patterns or anomalies that could lead to newsworthy stories.

123456789101112131415
import pandas as pd # Create a DataFrame of public official salaries data = { "Name": ["Alex Kim", "Jordan Lee", "Morgan Patel", "Taylor Smith", "Casey Jones"], "Position": ["Mayor", "City Clerk", "Fire Chief", "Police Chief", "Treasurer"], "Salary": [120000, 75000, 98000, 105000, 87000] } salaries_df = pd.DataFrame(data) # Display summary statistics for the Salary column salary_stats = salaries_df["Salary"].describe() print("Salary Summary Statistics:") print(salary_stats)
copy

In the code above, you use the pandas library to create a DataFrame from a dictionary containing names, job positions, and salaries of public officials. The pd.DataFrame() function turns the dictionary into a structured table. Once your data is in a DataFrame, you can use the .describe() method to quickly generate summary statistics about the Salary column. This includes the count, mean, standard deviation, minimum, and maximum values, as well as quartiles. For journalists, these statistics are essential for spotting outliersβ€”such as an unusually high salaryβ€”or identifying overall trends, like the average pay for city officials. This rapid overview helps you decide where to dig deeper for your reporting.

1234
# Filter officials with salaries above $90,000 high_earners = salaries_df[salaries_df["Salary"] > 90000] print("Officials earning above $90,000:") print(high_earners)
copy

1. What function in pandas is used to create a DataFrame from a dictionary?

2. How can filtering data help journalists find newsworthy stories?

3. Fill in the blank: To display the first 10 rows of a DataFrame, use _____

question mark

What function in pandas is used to create a DataFrame from a dictionary?

Select the correct answer

question mark

How can filtering data help journalists find newsworthy stories?

Select the correct answer

question-icon

Fill in the blank: To display the first 10 rows of a DataFrame, use _____

salaries_df
Name Position Salary
0 Alex Kim Mayor 120000
1 Jordan Lee City Clerk 75000
2 Morgan Patel Fire Chief 98000
3 Taylor Smith Police Chief 105000
4 Casey Jones Treasurer 87000
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 2
some-alt