Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Finding the Smallest Values of a Column | Extracting Data
Advanced Techniques in pandas

book
Finding the Smallest Values of a Column

We will learn another crucial function, which outputs the top smallest or largest values. You already know that we can sort values and then extract a specific number of rows. Unsurprisingly, pandas can do so using only one line of code. Look at the example of how to retrieve the oldest fifteen cars:

import pandas as pd
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv', index_col = 0)
data_smallest = data.nsmallest(15, 'Year')
print(data_smallest.head(15))
1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv', index_col = 0) data_smallest = data.nsmallest(15, 'Year') print(data_smallest.head(15))
copy

If you want to sort by one column and then by another, just put a list with column names in the necessary order. Look at the example where we will sort firstly by 'Year' and then by 'Engine_volume'. This code will first extract the 5 oldest cars, and then if the years match, the car with the lesser value of the 'Engine_volume' column will take priority:

import pandas as pd
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv', index_col = 0)
data_smallest = data.nsmallest(5, ['Year', 'Engine_volume'])
print(data_smallest.head())
1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv', index_col = 0) data_smallest = data.nsmallest(5, ['Year', 'Engine_volume']) print(data_smallest.head())
copy

Try to compare the two examples below. Now we will advance the function a little bit. Let's return our examples with the column's 'Year' values. In our column, the 'Year' values can be repeated, so if we want to output the ten oldest cars with the previous syntax, our function will take just ten values. It doesn't care if the 11th or 12th value is the same as the 10th. We can put the argument keep = 'all' into the .nsmallest() method to prevent such cases. Look at the example, and try to execute it to see the difference:

import pandas as pd
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv', index_col = 0)
# Case without using `keep = 'all'` argument
data_smallest = data.nsmallest(6, 'Year')
print(data_smallest)

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv', index_col = 0)
# Case with using `keep = 'all'` argument
data_smallest = data.nsmallest(6, 'Year',
keep = 'all')
print(data_smallest)
1234567891011
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv', index_col = 0) # Case without using `keep = 'all'` argument data_smallest = data.nsmallest(6, 'Year') print(data_smallest) data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv', index_col = 0) # Case with using `keep = 'all'` argument data_smallest = data.nsmallest(6, 'Year', keep = 'all') print(data_smallest)
copy
Uppgift

Swipe to start coding

Finally, it's time to practice! Here, you should follow this algorithm:

  1. Retrieve data on cars where the column 'Year' values are greater than 2010.
  2. Extract the cheapest 15 cars (the 15 smallest values of the column 'Price'). Include all duplicated values of the column 'Price'.
  3. Output all values of the data set data_cheapest.

Lösning

import pandas as pd

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv', index_col = 0)

# To retrieve specific values of the column 'Year'
data_extracted = data.loc[data['Year'] > 2010]
# Extract the `15` cheapest cars
data_cheapest = data_extracted.nsmallest(15, 'Price',
keep = 'all')

# Print data
print(data_cheapest)

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 3. Kapitel 5
import pandas as pd

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/cars.csv', index_col = 0)

# To retrieve specific values of the column 'Year'
data_extracted = data___
# Extract the `15` cheapest cars
data_cheapest = data_extracted.___(___, '___',
keep = '___')

# Print data
___

Fråga AI

expand
ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

some-alt