Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Data Quality Audit | Healthcare Automation and Reporting
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Python for Healthcare Professionals
Sectionย 3. Chapterย 6
single

single

bookChallenge: Data Quality Audit

Swipe to show menu

In healthcare, ensuring the accuracy and completeness of patient records is critical. Data quality audits help you identify errors such as negative ages or missing required information, which could impact patient care and reporting. In this challenge, you will use python to audit a DataFrame of patient records for two common issues: negative ages and missing diagnoses. You will then generate a report listing all problematic records and output this report as a CSV file, simulating a real-world data quality assurance workflow.

To begin, you will need to create a DataFrame that represents a sample set of patient records. This DataFrame should include at least the fields patient_id, age, and diagnosis. The next step is to check for negative values in the age column, which are not possible in real patient data. You will also check for missing values in the diagnosis field, as every patient record should contain a diagnosis for accurate medical tracking and billing.

Once you have identified records with these issues, you will compile them into a separate DataFrame to generate a clear and actionable report. Finally, you will export this report to a CSV file, which is a common format for sharing and reviewing data quality findings in healthcare settings.

123456789101112131415161718192021222324252627
import pandas as pd # Sample patient records data = { "patient_id": [1, 2, 3, 4, 5], "age": [34, -2, 55, 42, 28], "diagnosis": ["Hypertension", None, "Diabetes", "Asthma", None] } df = pd.DataFrame(data) # Identify records with negative ages negative_age = df[df["age"] < 0] # Identify records with missing diagnosis missing_diagnosis = df[df["diagnosis"].isnull()] # Combine all problematic records, removing duplicates problematic_records = pd.concat([negative_age, missing_diagnosis]).drop_duplicates() # Generate the data quality report as a new DataFrame report = problematic_records.copy() # Output the report to a CSV file report.to_csv("data_quality_report.csv", index=False) print("Data quality audit complete. Problematic records:") print(report)
copy
Note
Note

Data quality audits like this are essential for maintaining trustworthy medical records. By regularly checking for logical errors and missing information, you help ensure patient safety and compliance with healthcare regulations.

Task

Swipe to start coding

Write a script that:

  • Loads a DataFrame containing patient records with the columns patient_id, age, and diagnosis.
  • Finds all records where the age is negative.
  • Finds all records where the diagnosis field is missing.
  • Combines these problematic records into a single report, without duplicates.
  • Outputs the report as a CSV file called data_quality_report.csv.

Your script should use only the pandas library.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Sectionย 3. Chapterย 6
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

some-alt