Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Working with Tabular Data | Data Collection and Cleaning for Journalists
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Python for Journalists and Media

bookWorking with Tabular Data

Tabular data is at the heart of many impactful journalism stories. As a journalist, you often encounter structured tables when working with government data, Freedom of Information Act (FOIA) releases, public salary disclosures, or datasets from research organizations. Tables are a popular format because they organize information into rows and columns, making it easier to compare, analyze, and spot patterns or anomalies that could lead to newsworthy stories.

123456789101112131415
import pandas as pd # Create a DataFrame of public official salaries data = { "Name": ["Alex Kim", "Jordan Lee", "Morgan Patel", "Taylor Smith", "Casey Jones"], "Position": ["Mayor", "City Clerk", "Fire Chief", "Police Chief", "Treasurer"], "Salary": [120000, 75000, 98000, 105000, 87000] } salaries_df = pd.DataFrame(data) # Display summary statistics for the Salary column salary_stats = salaries_df["Salary"].describe() print("Salary Summary Statistics:") print(salary_stats)
copy

In the code above, you use the pandas library to create a DataFrame from a dictionary containing names, job positions, and salaries of public officials. The pd.DataFrame() function turns the dictionary into a structured table. Once your data is in a DataFrame, you can use the .describe() method to quickly generate summary statistics about the Salary column. This includes the count, mean, standard deviation, minimum, and maximum values, as well as quartiles. For journalists, these statistics are essential for spotting outliers—such as an unusually high salary—or identifying overall trends, like the average pay for city officials. This rapid overview helps you decide where to dig deeper for your reporting.

1234
# Filter officials with salaries above $90,000 high_earners = salaries_df[salaries_df["Salary"] > 90000] print("Officials earning above $90,000:") print(high_earners)
copy

1. What function in pandas is used to create a DataFrame from a dictionary?

2. How can filtering data help journalists find newsworthy stories?

3. Fill in the blank: To display the first 10 rows of a DataFrame, use _____

question mark

What function in pandas is used to create a DataFrame from a dictionary?

Select the correct answer

question mark

How can filtering data help journalists find newsworthy stories?

Select the correct answer

question-icon

Fill in the blank: To display the first 10 rows of a DataFrame, use _____

salaries_df
Name Position Salary
0 Alex Kim Mayor 120000
1 Jordan Lee City Clerk 75000
2 Morgan Patel Fire Chief 98000
3 Taylor Smith Police Chief 105000
4 Casey Jones Treasurer 87000
Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 2

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Suggested prompts:

How can I identify which official has the highest salary?

Can you explain how to filter for officials earning below a certain amount?

What other types of analysis can I perform with this salary data?

bookWorking with Tabular Data

Svep för att visa menyn

Tabular data is at the heart of many impactful journalism stories. As a journalist, you often encounter structured tables when working with government data, Freedom of Information Act (FOIA) releases, public salary disclosures, or datasets from research organizations. Tables are a popular format because they organize information into rows and columns, making it easier to compare, analyze, and spot patterns or anomalies that could lead to newsworthy stories.

123456789101112131415
import pandas as pd # Create a DataFrame of public official salaries data = { "Name": ["Alex Kim", "Jordan Lee", "Morgan Patel", "Taylor Smith", "Casey Jones"], "Position": ["Mayor", "City Clerk", "Fire Chief", "Police Chief", "Treasurer"], "Salary": [120000, 75000, 98000, 105000, 87000] } salaries_df = pd.DataFrame(data) # Display summary statistics for the Salary column salary_stats = salaries_df["Salary"].describe() print("Salary Summary Statistics:") print(salary_stats)
copy

In the code above, you use the pandas library to create a DataFrame from a dictionary containing names, job positions, and salaries of public officials. The pd.DataFrame() function turns the dictionary into a structured table. Once your data is in a DataFrame, you can use the .describe() method to quickly generate summary statistics about the Salary column. This includes the count, mean, standard deviation, minimum, and maximum values, as well as quartiles. For journalists, these statistics are essential for spotting outliers—such as an unusually high salary—or identifying overall trends, like the average pay for city officials. This rapid overview helps you decide where to dig deeper for your reporting.

1234
# Filter officials with salaries above $90,000 high_earners = salaries_df[salaries_df["Salary"] > 90000] print("Officials earning above $90,000:") print(high_earners)
copy

1. What function in pandas is used to create a DataFrame from a dictionary?

2. How can filtering data help journalists find newsworthy stories?

3. Fill in the blank: To display the first 10 rows of a DataFrame, use _____

question mark

What function in pandas is used to create a DataFrame from a dictionary?

Select the correct answer

question mark

How can filtering data help journalists find newsworthy stories?

Select the correct answer

question-icon

Fill in the blank: To display the first 10 rows of a DataFrame, use _____

salaries_df
Name Position Salary
0 Alex Kim Mayor 120000
1 Jordan Lee City Clerk 75000
2 Morgan Patel Fire Chief 98000
3 Taylor Smith Police Chief 105000
4 Casey Jones Treasurer 87000
Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 2
some-alt