Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Challenge: Clean Lab Results | Healthcare Data Fundamentals
Python for Healthcare Professionals
Osio 1. Luku 5
single

single

bookChallenge: Clean Lab Results

Pyyhkäise näyttääksesi valikon

Working with healthcare datasets often involves addressing missing values, as lab results may be incomplete due to a variety of reasons. Handling these gaps is essential before any analysis or modeling. Suppose you receive a DataFrame containing patient lab results, and you notice that some entries in the cholesterol column are missing. Your task is to prepare this data for further analysis by following a standard cleaning procedure.

1234567891011121314151617181920
import pandas as pd # Sample DataFrame representing patient lab results data = { "patient_id": [101, 102, 103, 104, 105], "cholesterol": [180, None, 210, None, 190] } df = pd.DataFrame(data) # 1. Identify the number of missing values in 'cholesterol' missing_count = df["cholesterol"].isnull().sum() print("Number of missing values in 'cholesterol':", missing_count) # 2. Fill missing values with the median cholesterol value median_chol = df["cholesterol"].median() df["cholesterol"].fillna(median_chol, inplace=True) # 3. Output the cleaned DataFrame print("\nCleaned DataFrame:") print(df)
copy

By identifying and filling missing values with the median, you ensure that the dataset remains representative of the typical patient and is less skewed by outliers than if you used the mean. This approach is especially relevant in healthcare, where preserving the integrity of clinical data is crucial for accurate downstream analysis.

Tehtävä

Swipe to start coding

  • Use pandas to create a DataFrame with columns patient_id and cholesterol using the data provided below.
  • Count the number of missing values in the cholesterol column and print this number.
  • Fill all missing values in the cholesterol column with the median cholesterol value.
  • Print the cleaned DataFrame after filling missing values.

Data to use:

patient_idcholesterol
201205
202
203187
204
205220

Ratkaisu

Switch to desktopVaihda työpöytään todellista harjoitusta vartenJatka siitä, missä olet käyttämällä jotakin alla olevista vaihtoehdoista
Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 1. Luku 5
single

single

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

some-alt