Challenge: Clean a List of News Sources
Clean, reliable data is critical for journalists who want to build trustworthy media databases. When working with lists of news sources, data often arrives in a messy state: duplicate entries can inflate counts, missing website links can leave gaps in research, and inconsistent capitalization can make automated analysis difficult. Ensuring your data is clean not only saves time but also prevents errors in your reporting.
123456789101112131415161718192021222324252627import pandas as pd # Example: Messy news sources data data = { "Name": [ "the daily news", "The Daily News", "Global Times", "global times", "Metro Herald", "Metro herald", "Metro Herald", "The Observer", "The Observer" ], "Website": [ "www.dailynews.com", None, "www.globaltimes.com", "www.globaltimes.com", "www.metroherald.com", None, None, "www.observer.com", None ] } df = pd.DataFrame(data) # Remove duplicate rows based on both columns df = df.drop_duplicates() # Fill missing website URLs with 'Unknown' df["Website"] = df["Website"].fillna("Unknown") # Standardize news source names to title case df["Name"] = df["Name"].str.title() # Output the cleaned DataFrame print(df)
Cleaning your data in this way makes your media analysis more reliable. By removing duplicates, you ensure each source is only counted once. Filling in missing website URLs with a placeholder like "Unknown" allows you to spot gaps without breaking your workflow. Standardizing name capitalization avoids mismatches and makes grouping or filtering sources much easier. Clean data leads to more accurate reporting and helps maintain the credibility of your findings.
Swipe to start coding
Write a function that takes a DataFrame with news source names and website URLs, and returns a cleaned DataFrame by:
- Removing duplicate rows;
- Filling missing website URLs with 'Unknown';
- Capitalizing each word in the news source names.
Soluzione
Grazie per i tuoi commenti!
single
Chieda ad AI
Chieda ad AI
Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione
Fantastico!
Completion tasso migliorato a 4.76
Challenge: Clean a List of News Sources
Scorri per mostrare il menu
Clean, reliable data is critical for journalists who want to build trustworthy media databases. When working with lists of news sources, data often arrives in a messy state: duplicate entries can inflate counts, missing website links can leave gaps in research, and inconsistent capitalization can make automated analysis difficult. Ensuring your data is clean not only saves time but also prevents errors in your reporting.
123456789101112131415161718192021222324252627import pandas as pd # Example: Messy news sources data data = { "Name": [ "the daily news", "The Daily News", "Global Times", "global times", "Metro Herald", "Metro herald", "Metro Herald", "The Observer", "The Observer" ], "Website": [ "www.dailynews.com", None, "www.globaltimes.com", "www.globaltimes.com", "www.metroherald.com", None, None, "www.observer.com", None ] } df = pd.DataFrame(data) # Remove duplicate rows based on both columns df = df.drop_duplicates() # Fill missing website URLs with 'Unknown' df["Website"] = df["Website"].fillna("Unknown") # Standardize news source names to title case df["Name"] = df["Name"].str.title() # Output the cleaned DataFrame print(df)
Cleaning your data in this way makes your media analysis more reliable. By removing duplicates, you ensure each source is only counted once. Filling in missing website URLs with a placeholder like "Unknown" allows you to spot gaps without breaking your workflow. Standardizing name capitalization avoids mismatches and makes grouping or filtering sources much easier. Clean data leads to more accurate reporting and helps maintain the credibility of your findings.
Swipe to start coding
Write a function that takes a DataFrame with news source names and website URLs, and returns a cleaned DataFrame by:
- Removing duplicate rows;
- Filling missing website URLs with 'Unknown';
- Capitalizing each word in the news source names.
Soluzione
Grazie per i tuoi commenti!
single