Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Statistical Operations | Math with NumPy
Ultimate NumPy
course content

Course Content

Ultimate NumPy

Ultimate NumPy

1. NumPy Basics
2. Indexing and Slicing
3. Commonly used NumPy Functions
4. Math with NumPy

bookStatistical Operations

Performing various statistical operations on arrays is essential for data analysis and machine learning. Therefore, we will discuss how to perform some of these operations effectively with NumPy.

Measures of Central Tendency

Measures of central tendency represent a central or representative value within a probability distribution. Most of the time, however, you will calculate these measures for a certain sample.

Here are the three main measures:

  • Mean: The sum of all values divided by the total number of values;
  • Median: The middle value in a sorted sample;
  • Mode: The most frequent value in the sample.

Unfortunately, there is no function in NumPy for calculating the mode. Other libraries can be used for this purpose, or you can write the function yourself.

Nevertheless, NumPy provides mean() and median() functions for calculating the mean and median, respectively:

12345678
import numpy as np sample = np.array([10, 25, 15, 30, 20, 10, 2]) # Calculating the mean sample_mean = np.mean(sample) print(f'Sorted sample: {np.sort(sample)}') # Calculating the median sample_median = np.median(sample) print(f'Mean: {sample_mean}, median: {sample_median}')
copy

We also displayed the sorted sample so you can clearly see the median. Our sample has an odd number of elements (7), so the median is simply the element at index (n + 1) / 2 in the sorted sample, where n is the size of the sample.

Note

When the sample has an even number of elements, the median is the average of the elements at index n / 2 and n / 2 - 1 in the sorted sample.

Here is an example with a sample having an even number of elements:

1234
import numpy as np sample = np.array([1, 2, 8, 10, 15, 20, 25, 30]) sample_median = np.median(sample) print(f'Median: {sample_median}')
copy

To make things clearer, we wrote the sample in a sorted manner. Our sample has 8 elements, so n / 2 - 1 = 3 and sample[3] is 10. n / 2 = 4 and sample[4] is 15. Therefore, our median is (10 + 15) / 2 = 12.5.

Measures of Spread

Two measures of spread are variance and standard deviation. Variance measures how spread out the data is. It is equal to the average of the squared differences of each value from the mean. The standard deviation is the square root of the variance. It provides a measure of how spread out the data is in the same units as the data.

NumPy provides the var() function to calculate the variance of the sample and the std() function to calculate the standard deviation of the sample:

1234567
import numpy as np sample = np.array([10, 25, 15, 30, 20, 10, 2]) # Calculating the variance sample_variance = np.var(sample) # Calculating the standard deviation sample_std = np.std(sample) print(f'Variance: {sample_variance}, standard deviation: {sample_std}')
copy

As you can see, everything is simple here.

Calculations in Higher Dimensional Arrays

All of these functions have a second parameter axis. Its default value is None, which means that the measure will be calculated along a flattened array (even if the original array is 2D or higher dimensional). You can also specify the exact axis along which to calculate the measure:

12345678910
import numpy as np array_2d = np.array([[1, 2, 3], [4, 5, 6]]) # Calculating the mean in a flattened array print(np.mean(array_2d)) print('-' *13) # Calculating the mean along axis 0 print(np.mean(array_2d, axis=0)) print('-' *13) # Calculating the mean along axis 1 print(np.mean(array_2d, axis=1))
copy
Task
test

Swipe to show code editor

Let's say exam_scores is a 2D array of simulated test scores for 5 students (5 columns) for 2 different exams (2 rows). Here are the tasks:

  1. Calculate the mean score for each exam by specifying the second keyword argument.

  2. Calculate the median of all scores.

  3. Calculate the variance of all scores.

  4. Calculate the standard deviation of all scores.

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 4. Chapter 3
toggle bottom row

bookStatistical Operations

Performing various statistical operations on arrays is essential for data analysis and machine learning. Therefore, we will discuss how to perform some of these operations effectively with NumPy.

Measures of Central Tendency

Measures of central tendency represent a central or representative value within a probability distribution. Most of the time, however, you will calculate these measures for a certain sample.

Here are the three main measures:

  • Mean: The sum of all values divided by the total number of values;
  • Median: The middle value in a sorted sample;
  • Mode: The most frequent value in the sample.

Unfortunately, there is no function in NumPy for calculating the mode. Other libraries can be used for this purpose, or you can write the function yourself.

Nevertheless, NumPy provides mean() and median() functions for calculating the mean and median, respectively:

12345678
import numpy as np sample = np.array([10, 25, 15, 30, 20, 10, 2]) # Calculating the mean sample_mean = np.mean(sample) print(f'Sorted sample: {np.sort(sample)}') # Calculating the median sample_median = np.median(sample) print(f'Mean: {sample_mean}, median: {sample_median}')
copy

We also displayed the sorted sample so you can clearly see the median. Our sample has an odd number of elements (7), so the median is simply the element at index (n + 1) / 2 in the sorted sample, where n is the size of the sample.

Note

When the sample has an even number of elements, the median is the average of the elements at index n / 2 and n / 2 - 1 in the sorted sample.

Here is an example with a sample having an even number of elements:

1234
import numpy as np sample = np.array([1, 2, 8, 10, 15, 20, 25, 30]) sample_median = np.median(sample) print(f'Median: {sample_median}')
copy

To make things clearer, we wrote the sample in a sorted manner. Our sample has 8 elements, so n / 2 - 1 = 3 and sample[3] is 10. n / 2 = 4 and sample[4] is 15. Therefore, our median is (10 + 15) / 2 = 12.5.

Measures of Spread

Two measures of spread are variance and standard deviation. Variance measures how spread out the data is. It is equal to the average of the squared differences of each value from the mean. The standard deviation is the square root of the variance. It provides a measure of how spread out the data is in the same units as the data.

NumPy provides the var() function to calculate the variance of the sample and the std() function to calculate the standard deviation of the sample:

1234567
import numpy as np sample = np.array([10, 25, 15, 30, 20, 10, 2]) # Calculating the variance sample_variance = np.var(sample) # Calculating the standard deviation sample_std = np.std(sample) print(f'Variance: {sample_variance}, standard deviation: {sample_std}')
copy

As you can see, everything is simple here.

Calculations in Higher Dimensional Arrays

All of these functions have a second parameter axis. Its default value is None, which means that the measure will be calculated along a flattened array (even if the original array is 2D or higher dimensional). You can also specify the exact axis along which to calculate the measure:

12345678910
import numpy as np array_2d = np.array([[1, 2, 3], [4, 5, 6]]) # Calculating the mean in a flattened array print(np.mean(array_2d)) print('-' *13) # Calculating the mean along axis 0 print(np.mean(array_2d, axis=0)) print('-' *13) # Calculating the mean along axis 1 print(np.mean(array_2d, axis=1))
copy
Task
test

Swipe to show code editor

Let's say exam_scores is a 2D array of simulated test scores for 5 students (5 columns) for 2 different exams (2 rows). Here are the tasks:

  1. Calculate the mean score for each exam by specifying the second keyword argument.

  2. Calculate the median of all scores.

  3. Calculate the variance of all scores.

  4. Calculate the standard deviation of all scores.

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 4. Chapter 3
toggle bottom row

bookStatistical Operations

Performing various statistical operations on arrays is essential for data analysis and machine learning. Therefore, we will discuss how to perform some of these operations effectively with NumPy.

Measures of Central Tendency

Measures of central tendency represent a central or representative value within a probability distribution. Most of the time, however, you will calculate these measures for a certain sample.

Here are the three main measures:

  • Mean: The sum of all values divided by the total number of values;
  • Median: The middle value in a sorted sample;
  • Mode: The most frequent value in the sample.

Unfortunately, there is no function in NumPy for calculating the mode. Other libraries can be used for this purpose, or you can write the function yourself.

Nevertheless, NumPy provides mean() and median() functions for calculating the mean and median, respectively:

12345678
import numpy as np sample = np.array([10, 25, 15, 30, 20, 10, 2]) # Calculating the mean sample_mean = np.mean(sample) print(f'Sorted sample: {np.sort(sample)}') # Calculating the median sample_median = np.median(sample) print(f'Mean: {sample_mean}, median: {sample_median}')
copy

We also displayed the sorted sample so you can clearly see the median. Our sample has an odd number of elements (7), so the median is simply the element at index (n + 1) / 2 in the sorted sample, where n is the size of the sample.

Note

When the sample has an even number of elements, the median is the average of the elements at index n / 2 and n / 2 - 1 in the sorted sample.

Here is an example with a sample having an even number of elements:

1234
import numpy as np sample = np.array([1, 2, 8, 10, 15, 20, 25, 30]) sample_median = np.median(sample) print(f'Median: {sample_median}')
copy

To make things clearer, we wrote the sample in a sorted manner. Our sample has 8 elements, so n / 2 - 1 = 3 and sample[3] is 10. n / 2 = 4 and sample[4] is 15. Therefore, our median is (10 + 15) / 2 = 12.5.

Measures of Spread

Two measures of spread are variance and standard deviation. Variance measures how spread out the data is. It is equal to the average of the squared differences of each value from the mean. The standard deviation is the square root of the variance. It provides a measure of how spread out the data is in the same units as the data.

NumPy provides the var() function to calculate the variance of the sample and the std() function to calculate the standard deviation of the sample:

1234567
import numpy as np sample = np.array([10, 25, 15, 30, 20, 10, 2]) # Calculating the variance sample_variance = np.var(sample) # Calculating the standard deviation sample_std = np.std(sample) print(f'Variance: {sample_variance}, standard deviation: {sample_std}')
copy

As you can see, everything is simple here.

Calculations in Higher Dimensional Arrays

All of these functions have a second parameter axis. Its default value is None, which means that the measure will be calculated along a flattened array (even if the original array is 2D or higher dimensional). You can also specify the exact axis along which to calculate the measure:

12345678910
import numpy as np array_2d = np.array([[1, 2, 3], [4, 5, 6]]) # Calculating the mean in a flattened array print(np.mean(array_2d)) print('-' *13) # Calculating the mean along axis 0 print(np.mean(array_2d, axis=0)) print('-' *13) # Calculating the mean along axis 1 print(np.mean(array_2d, axis=1))
copy
Task
test

Swipe to show code editor

Let's say exam_scores is a 2D array of simulated test scores for 5 students (5 columns) for 2 different exams (2 rows). Here are the tasks:

  1. Calculate the mean score for each exam by specifying the second keyword argument.

  2. Calculate the median of all scores.

  3. Calculate the variance of all scores.

  4. Calculate the standard deviation of all scores.

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Performing various statistical operations on arrays is essential for data analysis and machine learning. Therefore, we will discuss how to perform some of these operations effectively with NumPy.

Measures of Central Tendency

Measures of central tendency represent a central or representative value within a probability distribution. Most of the time, however, you will calculate these measures for a certain sample.

Here are the three main measures:

  • Mean: The sum of all values divided by the total number of values;
  • Median: The middle value in a sorted sample;
  • Mode: The most frequent value in the sample.

Unfortunately, there is no function in NumPy for calculating the mode. Other libraries can be used for this purpose, or you can write the function yourself.

Nevertheless, NumPy provides mean() and median() functions for calculating the mean and median, respectively:

12345678
import numpy as np sample = np.array([10, 25, 15, 30, 20, 10, 2]) # Calculating the mean sample_mean = np.mean(sample) print(f'Sorted sample: {np.sort(sample)}') # Calculating the median sample_median = np.median(sample) print(f'Mean: {sample_mean}, median: {sample_median}')
copy

We also displayed the sorted sample so you can clearly see the median. Our sample has an odd number of elements (7), so the median is simply the element at index (n + 1) / 2 in the sorted sample, where n is the size of the sample.

Note

When the sample has an even number of elements, the median is the average of the elements at index n / 2 and n / 2 - 1 in the sorted sample.

Here is an example with a sample having an even number of elements:

1234
import numpy as np sample = np.array([1, 2, 8, 10, 15, 20, 25, 30]) sample_median = np.median(sample) print(f'Median: {sample_median}')
copy

To make things clearer, we wrote the sample in a sorted manner. Our sample has 8 elements, so n / 2 - 1 = 3 and sample[3] is 10. n / 2 = 4 and sample[4] is 15. Therefore, our median is (10 + 15) / 2 = 12.5.

Measures of Spread

Two measures of spread are variance and standard deviation. Variance measures how spread out the data is. It is equal to the average of the squared differences of each value from the mean. The standard deviation is the square root of the variance. It provides a measure of how spread out the data is in the same units as the data.

NumPy provides the var() function to calculate the variance of the sample and the std() function to calculate the standard deviation of the sample:

1234567
import numpy as np sample = np.array([10, 25, 15, 30, 20, 10, 2]) # Calculating the variance sample_variance = np.var(sample) # Calculating the standard deviation sample_std = np.std(sample) print(f'Variance: {sample_variance}, standard deviation: {sample_std}')
copy

As you can see, everything is simple here.

Calculations in Higher Dimensional Arrays

All of these functions have a second parameter axis. Its default value is None, which means that the measure will be calculated along a flattened array (even if the original array is 2D or higher dimensional). You can also specify the exact axis along which to calculate the measure:

12345678910
import numpy as np array_2d = np.array([[1, 2, 3], [4, 5, 6]]) # Calculating the mean in a flattened array print(np.mean(array_2d)) print('-' *13) # Calculating the mean along axis 0 print(np.mean(array_2d, axis=0)) print('-' *13) # Calculating the mean along axis 1 print(np.mean(array_2d, axis=1))
copy
Task
test

Swipe to show code editor

Let's say exam_scores is a 2D array of simulated test scores for 5 students (5 columns) for 2 different exams (2 rows). Here are the tasks:

  1. Calculate the mean score for each exam by specifying the second keyword argument.

  2. Calculate the median of all scores.

  3. Calculate the variance of all scores.

  4. Calculate the standard deviation of all scores.

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Section 4. Chapter 3
Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt