Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer Challenge: Clean Lab Results | Healthcare Data Fundamentals
Python for Healthcare Professionals
Sectie 1. Hoofdstuk 5
single

single

bookChallenge: Clean Lab Results

Veeg om het menu te tonen

Working with healthcare datasets often involves addressing missing values, as lab results may be incomplete due to a variety of reasons. Handling these gaps is essential before any analysis or modeling. Suppose you receive a DataFrame containing patient lab results, and you notice that some entries in the cholesterol column are missing. Your task is to prepare this data for further analysis by following a standard cleaning procedure.

1234567891011121314151617181920
import pandas as pd # Sample DataFrame representing patient lab results data = { "patient_id": [101, 102, 103, 104, 105], "cholesterol": [180, None, 210, None, 190] } df = pd.DataFrame(data) # 1. Identify the number of missing values in 'cholesterol' missing_count = df["cholesterol"].isnull().sum() print("Number of missing values in 'cholesterol':", missing_count) # 2. Fill missing values with the median cholesterol value median_chol = df["cholesterol"].median() df["cholesterol"].fillna(median_chol, inplace=True) # 3. Output the cleaned DataFrame print("\nCleaned DataFrame:") print(df)
copy

By identifying and filling missing values with the median, you ensure that the dataset remains representative of the typical patient and is less skewed by outliers than if you used the mean. This approach is especially relevant in healthcare, where preserving the integrity of clinical data is crucial for accurate downstream analysis.

Taak

Swipe to start coding

  • Use pandas to create a DataFrame with columns patient_id and cholesterol using the data provided below.
  • Count the number of missing values in the cholesterol column and print this number.
  • Fill all missing values in the cholesterol column with the median cholesterol value.
  • Print the cleaned DataFrame after filling missing values.

Data to use:

patient_idcholesterol
201205
202
203187
204
205220

Oplossing

Switch to desktopSchakel over naar desktop voor praktijkervaringGa verder vanaf waar je bent met een van de onderstaande opties
Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 5
single

single

Vraag AI

expand

Vraag AI

ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

some-alt