single
Challenge: Data Quality Audit
Desliza para mostrar el menú
In healthcare, ensuring the accuracy and completeness of patient records is critical. Data quality audits help you identify errors such as negative ages or missing required information, which could impact patient care and reporting. In this challenge, you will use python to audit a DataFrame of patient records for two common issues: negative ages and missing diagnoses. You will then generate a report listing all problematic records and output this report as a CSV file, simulating a real-world data quality assurance workflow.
To begin, you will need to create a DataFrame that represents a sample set of patient records. This DataFrame should include at least the fields patient_id, age, and diagnosis. The next step is to check for negative values in the age column, which are not possible in real patient data. You will also check for missing values in the diagnosis field, as every patient record should contain a diagnosis for accurate medical tracking and billing.
Once you have identified records with these issues, you will compile them into a separate DataFrame to generate a clear and actionable report. Finally, you will export this report to a CSV file, which is a common format for sharing and reviewing data quality findings in healthcare settings.
123456789101112131415161718192021222324252627import pandas as pd # Sample patient records data = { "patient_id": [1, 2, 3, 4, 5], "age": [34, -2, 55, 42, 28], "diagnosis": ["Hypertension", None, "Diabetes", "Asthma", None] } df = pd.DataFrame(data) # Identify records with negative ages negative_age = df[df["age"] < 0] # Identify records with missing diagnosis missing_diagnosis = df[df["diagnosis"].isnull()] # Combine all problematic records, removing duplicates problematic_records = pd.concat([negative_age, missing_diagnosis]).drop_duplicates() # Generate the data quality report as a new DataFrame report = problematic_records.copy() # Output the report to a CSV file report.to_csv("data_quality_report.csv", index=False) print("Data quality audit complete. Problematic records:") print(report)
Data quality audits like this are essential for maintaining trustworthy medical records. By regularly checking for logical errors and missing information, you help ensure patient safety and compliance with healthcare regulations.
Swipe to start coding
Write a script that:
- Loads a DataFrame containing patient records with the columns
patient_id,age, anddiagnosis. - Finds all records where the
ageis negative. - Finds all records where the
diagnosisfield is missing. - Combines these problematic records into a single report, without duplicates.
- Outputs the report as a CSV file called
data_quality_report.csv.
Your script should use only the pandas library.
Solución
¡Gracias por tus comentarios!
single
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla