Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Challenge: Clean Lab Results | Healthcare Data Fundamentals
Python for Healthcare Professionals
Abschnitt 1. Kapitel 5
single

single

bookChallenge: Clean Lab Results

Swipe um das Menü anzuzeigen

Working with healthcare datasets often involves addressing missing values, as lab results may be incomplete due to a variety of reasons. Handling these gaps is essential before any analysis or modeling. Suppose you receive a DataFrame containing patient lab results, and you notice that some entries in the cholesterol column are missing. Your task is to prepare this data for further analysis by following a standard cleaning procedure.

1234567891011121314151617181920
import pandas as pd # Sample DataFrame representing patient lab results data = { "patient_id": [101, 102, 103, 104, 105], "cholesterol": [180, None, 210, None, 190] } df = pd.DataFrame(data) # 1. Identify the number of missing values in 'cholesterol' missing_count = df["cholesterol"].isnull().sum() print("Number of missing values in 'cholesterol':", missing_count) # 2. Fill missing values with the median cholesterol value median_chol = df["cholesterol"].median() df["cholesterol"].fillna(median_chol, inplace=True) # 3. Output the cleaned DataFrame print("\nCleaned DataFrame:") print(df)
copy

By identifying and filling missing values with the median, you ensure that the dataset remains representative of the typical patient and is less skewed by outliers than if you used the mean. This approach is especially relevant in healthcare, where preserving the integrity of clinical data is crucial for accurate downstream analysis.

Aufgabe

Swipe to start coding

  • Use pandas to create a DataFrame with columns patient_id and cholesterol using the data provided below.
  • Count the number of missing values in the cholesterol column and print this number.
  • Fill all missing values in the cholesterol column with the median cholesterol value.
  • Print the cleaned DataFrame after filling missing values.

Data to use:

patient_idcholesterol
201205
202
203187
204
205220

Lösung

Switch to desktopWechseln Sie zum Desktop, um in der realen Welt zu übenFahren Sie dort fort, wo Sie sind, indem Sie eine der folgenden Optionen verwenden
War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 5
single

single

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

some-alt