single
Challenge: Clean Lab Results
Swipe to show menu
Working with healthcare datasets often involves addressing missing values, as lab results may be incomplete due to a variety of reasons. Handling these gaps is essential before any analysis or modeling. Suppose you receive a DataFrame containing patient lab results, and you notice that some entries in the cholesterol column are missing. Your task is to prepare this data for further analysis by following a standard cleaning procedure.
1234567891011121314151617181920import pandas as pd # Sample DataFrame representing patient lab results data = { "patient_id": [101, 102, 103, 104, 105], "cholesterol": [180, None, 210, None, 190] } df = pd.DataFrame(data) # 1. Identify the number of missing values in 'cholesterol' missing_count = df["cholesterol"].isnull().sum() print("Number of missing values in 'cholesterol':", missing_count) # 2. Fill missing values with the median cholesterol value median_chol = df["cholesterol"].median() df["cholesterol"].fillna(median_chol, inplace=True) # 3. Output the cleaned DataFrame print("\nCleaned DataFrame:") print(df)
By identifying and filling missing values with the median, you ensure that the dataset remains representative of the typical patient and is less skewed by outliers than if you used the mean. This approach is especially relevant in healthcare, where preserving the integrity of clinical data is crucial for accurate downstream analysis.
Swipe to start coding
- Use pandas to create a DataFrame with columns
patient_idandcholesterolusing the data provided below. - Count the number of missing values in the
cholesterolcolumn and print this number. - Fill all missing values in the
cholesterolcolumn with the median cholesterol value. - Print the cleaned DataFrame after filling missing values.
Data to use:
| patient_id | cholesterol |
|---|---|
| 201 | 205 |
| 202 | |
| 203 | 187 |
| 204 | |
| 205 | 220 |
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat