Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Challenge: Clean Lab Results | Healthcare Data Fundamentals
Python for Healthcare Professionals
Seksjon 1. Kapittel 5
single

single

bookChallenge: Clean Lab Results

Sveip for å vise menyen

Working with healthcare datasets often involves addressing missing values, as lab results may be incomplete due to a variety of reasons. Handling these gaps is essential before any analysis or modeling. Suppose you receive a DataFrame containing patient lab results, and you notice that some entries in the cholesterol column are missing. Your task is to prepare this data for further analysis by following a standard cleaning procedure.

1234567891011121314151617181920
import pandas as pd # Sample DataFrame representing patient lab results data = { "patient_id": [101, 102, 103, 104, 105], "cholesterol": [180, None, 210, None, 190] } df = pd.DataFrame(data) # 1. Identify the number of missing values in 'cholesterol' missing_count = df["cholesterol"].isnull().sum() print("Number of missing values in 'cholesterol':", missing_count) # 2. Fill missing values with the median cholesterol value median_chol = df["cholesterol"].median() df["cholesterol"].fillna(median_chol, inplace=True) # 3. Output the cleaned DataFrame print("\nCleaned DataFrame:") print(df)
copy

By identifying and filling missing values with the median, you ensure that the dataset remains representative of the typical patient and is less skewed by outliers than if you used the mean. This approach is especially relevant in healthcare, where preserving the integrity of clinical data is crucial for accurate downstream analysis.

Oppgave

Swipe to start coding

  • Use pandas to create a DataFrame with columns patient_id and cholesterol using the data provided below.
  • Count the number of missing values in the cholesterol column and print this number.
  • Fill all missing values in the cholesterol column with the median cholesterol value.
  • Print the cleaned DataFrame after filling missing values.

Data to use:

patient_idcholesterol
201205
202
203187
204
205220

Løsning

Switch to desktopBytt til skrivebordet for virkelighetspraksisFortsett der du er med et av alternativene nedenfor
Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 5
single

single

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

some-alt