Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Challenge: Data Quality Audit | Healthcare Automation and Reporting
Python for Healthcare Professionals
Sección 3. Capítulo 6
single

single

bookChallenge: Data Quality Audit

Desliza para mostrar el menú

In healthcare, ensuring the accuracy and completeness of patient records is critical. Data quality audits help you identify errors such as negative ages or missing required information, which could impact patient care and reporting. In this challenge, you will use python to audit a DataFrame of patient records for two common issues: negative ages and missing diagnoses. You will then generate a report listing all problematic records and output this report as a CSV file, simulating a real-world data quality assurance workflow.

To begin, you will need to create a DataFrame that represents a sample set of patient records. This DataFrame should include at least the fields patient_id, age, and diagnosis. The next step is to check for negative values in the age column, which are not possible in real patient data. You will also check for missing values in the diagnosis field, as every patient record should contain a diagnosis for accurate medical tracking and billing.

Once you have identified records with these issues, you will compile them into a separate DataFrame to generate a clear and actionable report. Finally, you will export this report to a CSV file, which is a common format for sharing and reviewing data quality findings in healthcare settings.

123456789101112131415161718192021222324252627
import pandas as pd # Sample patient records data = { "patient_id": [1, 2, 3, 4, 5], "age": [34, -2, 55, 42, 28], "diagnosis": ["Hypertension", None, "Diabetes", "Asthma", None] } df = pd.DataFrame(data) # Identify records with negative ages negative_age = df[df["age"] < 0] # Identify records with missing diagnosis missing_diagnosis = df[df["diagnosis"].isnull()] # Combine all problematic records, removing duplicates problematic_records = pd.concat([negative_age, missing_diagnosis]).drop_duplicates() # Generate the data quality report as a new DataFrame report = problematic_records.copy() # Output the report to a CSV file report.to_csv("data_quality_report.csv", index=False) print("Data quality audit complete. Problematic records:") print(report)
copy
Note
Note

Data quality audits like this are essential for maintaining trustworthy medical records. By regularly checking for logical errors and missing information, you help ensure patient safety and compliance with healthcare regulations.

Tarea

Swipe to start coding

Write a script that:

  • Loads a DataFrame containing patient records with the columns patient_id, age, and diagnosis.
  • Finds all records where the age is negative.
  • Finds all records where the diagnosis field is missing.
  • Combines these problematic records into a single report, without duplicates.
  • Outputs the report as a CSV file called data_quality_report.csv.

Your script should use only the pandas library.

Solución

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 3. Capítulo 6
single

single

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

some-alt