Summary  
This chapter demonstrates how to use a method to retrieve the rows with the smallest values in one or more columns of a tabular data structure, including multi-column sorting and duplicate-handling options.  

General domain of usage  
Data analysis

Watch this video for a step-by-step walkthrough on using the pandas `.nsmallest()` method to quickly find the smallest values in a DataFrame column. You'll see how to use this function to extract rows with the lowest values, sort by multiple columns, and handle duplicate values with the `keep='all'` argument. This visual guide will reinforce the concepts from the chapter and help you confidently apply `.nsmallest()` in your own data analysis tasks.

**generation rule: pronounce .nsmallest() as "N smallest"**

You will learn another crucial function, which outputs the top smallest or largest values. You already know that we can sort values and then extract a specific number of rows. Unsurprisingly, **pandas** can do so using only one line of code. Look at the example of how to retrieve the oldest fifteen cars:

import pandas as pd
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv', index_col = 0)
data_smallest = data.nsmallest(15, 'Year')
print(data_smallest.head(15))

If you want to sort by one column and then by another, just put a list with column names in the necessary order. Look at the example where we will sort firstly by `'Year'` and then by `'Engine_volume'`. This code will first extract the `5` oldest cars, and then if the years match, the car with the lesser value of the `'Engine_volume'` column will take priority:

import pandas as pd
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv', index_col = 0)
data_smallest = data.nsmallest(5, ['Year', 'Engine_volume'])
print(data_smallest.head())

Try to compare the two examples below. Now we will advance the function a little bit. Let's return our examples with the column's `'Year'` values. In our column, the `'Year'` values can be repeated, so if we want to output the ten oldest cars with the previous syntax, our function will take just ten values. It doesn't care if the 11th or 12th value is the same as the 10th. We can put the argument `keep = 'all'` into the `.nsmallest()` method to prevent such cases. Look at the example, and try to execute it to see the difference:

import pandas as pd
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv', index_col = 0)
# Case without using `keep = 'all'` argument
data_smallest = data.nsmallest(6, 'Year')
print(data_smallest)

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv', index_col = 0)
# Case with using `keep = 'all'` argument
data_smallest = data.nsmallest(6, 'Year',
                             keep = 'all')
print(data_smallest)

import unittest
import pandas as pd
import io
import sys


def _dynamic_test(test_case, condition, success_msg, failure_msg):
    if condition:
        test_case._testMethodName = success_msg
        test_case.assertTrue(True, success_msg)
    else:
        test_case._testMethodName = failure_msg
        test_case.fail(failure_msg)


def frames_equal_ignore_col_order(df1, df2):
    """ÐÐµÑÐµÐ²ÑÑÑÑ ÑÑÐ²Ð½ÑÑÑÑ DataFrame Ð½ÐµÐ·Ð°Ð»ÐµÐ¶Ð½Ð¾ Ð²ÑÐ´ Ð¿Ð¾ÑÑÐ´ÐºÑ ÐºÐ¾Ð»Ð¾Ð½Ð¾Ðº."""
    if set(df1.columns) != set(df2.columns):
        return False
    common_cols = sorted(df1.columns)
    df1_sorted = df1[common_cols].reset_index(drop=True)
    df2_sorted = df2[common_cols].reset_index(drop=True)
    return df1_sorted.equals(df2_sorted)


class TestRetrieveYear(unittest.TestCase):
    def test_year_filter(self):
        """
        1. Retrieve data on cars where 'Year' > 2010.
        """
        import user_code
        df = pd.read_csv(
            "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv",
            index_col=0,
        )
        ref = df.loc[df["Year"] > 2010]

        condition = (
            hasattr(user_code, "data_extracted")
            and isinstance(user_code.data_extracted, pd.DataFrame)
            and frames_equal_ignore_col_order(user_code.data_extracted, ref)
        )

        _dynamic_test(
            self,
            condition,
            "The 'data_extracted' DataFrame correctly includes cars with Year > 2010.",
            "The filtering by Year > 2010 is incorrect. Check your .loc[] condition."
        )


class TestCheapestCars(unittest.TestCase):
    def test_cheapest_15(self):
        """
        2. Extract the 15 cheapest cars (including duplicates) using .nsmallest().
        """
        import user_code
        ref = user_code.data_extracted.nsmallest(15, "Price", keep="all")

        condition = (
            hasattr(user_code, "data_cheapest")
            and isinstance(user_code.data_cheapest, pd.DataFrame)
            and frames_equal_ignore_col_order(user_code.data_cheapest, ref)
        )

        _dynamic_test(
            self,
            condition,
            "The 'data_cheapest' DataFrame correctly includes the 15 cheapest cars (duplicates included).",
            "The extraction of the 15 cheapest cars is incorrect. Use .nsmallest(15, 'Price', keep='all')."
        )


class TestOutput(unittest.TestCase):
    def test_output_print(self):
        """
        3. Output all values of the 'data_cheapest' DataFrame.
        """
        import user_code

        captured_output = io.StringIO()
        sys.stdout = captured_output
        print(user_code.data_cheapest)
        sys.stdout = sys.__stdout__

        output_text = captured_output.getvalue().strip()
        condition = len(output_text) > 0
        _dynamic_test(
            self,
            condition,
            "The output displays all rows of 'data_cheapest' correctly.",
            "The 'data_cheapest' DataFrame is not printed. Use print(data_cheapest)."
        )


if __name__ == "__main__":
    unittest.main()


test_code.py

This course contains a lot of useful functions for a future data analyst. You will learn different ways of extracting data and even set conditions on it. After it, you will be familiar with the methods of grouping data. Also, you will learn how to preprocess data. Each section has its data set so that the course will be gripping.

This section will teach you how to output specific columns by their titles or indices. Also, you will get acquainted with the ways you can select rows  by indices.

Here, you will learn how to extract data that has specific conditions. Also, you will learn how to combine them and even create your own.

In this section, you will expand your knowledge on setting different data conditions. You will learn to check if your data is in a defined list of values or between two values. You will also learn how to find the largest and smallest values.

This section is one of the most fascinating of the course. Here, you will learn how to group data in different ways. It will help you work as a data analyst to find out information on specific data groups.

This section is one of the most significant for a data analyst because if the data contains missing data values in the incorrect format, it will be impossible to work with. Thus, you will learn how to deal with such inappropriate values here. 

Finding the Smallest Values of a Column

Solution