Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Challenge: Clean Lab Results | Healthcare Data Fundamentals
Python for Healthcare Professionals
Sezione 1. Capitolo 5
single

single

bookChallenge: Clean Lab Results

Scorri per mostrare il menu

Working with healthcare datasets often involves addressing missing values, as lab results may be incomplete due to a variety of reasons. Handling these gaps is essential before any analysis or modeling. Suppose you receive a DataFrame containing patient lab results, and you notice that some entries in the cholesterol column are missing. Your task is to prepare this data for further analysis by following a standard cleaning procedure.

1234567891011121314151617181920
import pandas as pd # Sample DataFrame representing patient lab results data = { "patient_id": [101, 102, 103, 104, 105], "cholesterol": [180, None, 210, None, 190] } df = pd.DataFrame(data) # 1. Identify the number of missing values in 'cholesterol' missing_count = df["cholesterol"].isnull().sum() print("Number of missing values in 'cholesterol':", missing_count) # 2. Fill missing values with the median cholesterol value median_chol = df["cholesterol"].median() df["cholesterol"].fillna(median_chol, inplace=True) # 3. Output the cleaned DataFrame print("\nCleaned DataFrame:") print(df)
copy

By identifying and filling missing values with the median, you ensure that the dataset remains representative of the typical patient and is less skewed by outliers than if you used the mean. This approach is especially relevant in healthcare, where preserving the integrity of clinical data is crucial for accurate downstream analysis.

Compito

Swipe to start coding

  • Use pandas to create a DataFrame with columns patient_id and cholesterol using the data provided below.
  • Count the number of missing values in the cholesterol column and print this number.
  • Fill all missing values in the cholesterol column with the median cholesterol value.
  • Print the cleaned DataFrame after filling missing values.

Data to use:

patient_idcholesterol
201205
202
203187
204
205220

Soluzione

Switch to desktopCambia al desktop per esercitarti nel mondo realeContinua da dove ti trovi utilizzando una delle opzioni seguenti
Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 1. Capitolo 5
single

single

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

some-alt