Describing the Data
pandas
offers the handy mean()
method that calculates the average of all values for each column.
df = pd.read_csv(file.csv)
mean_values = df.mean()
You can also the same method to determine the average value for a specific column:
df = pd.read_csv(file.csv)
mean_values = df['column_name'].mean()
pandas
also provides the mode()
method, which identifies the most frequently occurring value in each column.
df = pd.read_csv(file.csv)
mode_values = df.mode()
To find the mode for a particular column, the same method is used:
df = pd.read_csv(file.csv)
mode_values = df['column_name'].mode()[0]
Another useful method in pandas
is describe()
.
df = pd.read_csv(file.csv)
important_metrics = df.describe()
This method provides an overview of various metrics from the dataset, including:
- Total number of entries;
- Mean or average value;
- Standard deviation;
- The minimum and maximum values;
- The 25th, 50th (median), and 75th percentiles.
Task
Swipe to start coding
You are given a DataFrame
named wine_data
.
- Calculate the mean of the
'residual sugar'
column and store the result in theresidual_sugar_mean
variable. - Calculate the mode of the
'fixed acidity'
column and store the result in thefixed_acidity_mode
variable. - Retrieve an overview of various statistics from
wine_data
and store the result in thedescribed_data
variable.
Solution
Everything was clear?
Thanks for your feedback!