Course Content
Pandas First Steps
Pandas First Steps
Unique Values
Data often gets duplicated in DataFrames. For instance, in the countries
DataFrame, the 'continent'
column has repeated entries. There's a method that retrieves an array of distinct values from a specific DataFrame column.
import pandas as pd country_data = {'country' : ['Thailand', 'Philippines', 'Monaco', 'Malta', 'Sweden', 'Paraguay', 'Latvia'], 'continent' : ['Asia', 'Asia', 'Europe', 'Europe', 'Europe', 'South America', 'Europe'], 'capital':['Bangkok', 'Manila', 'Monaco', 'Valletta', 'Stockholm', 'Asuncion', 'Riga']} countries = pd.DataFrame(country_data) print(countries)
Now, we'll apply the unique()
method to the 'continent'
and 'country'
columns:
import pandas as pd country_data = {'country' : ['Thailand', 'Philippines', 'Monaco', 'Malta', 'Sweden', 'Paraguay', 'Latvia'], 'continent' : ['Asia', 'Asia', 'Europe', 'Europe', 'Europe', 'South America', 'Europe'], 'capital':['Bangkok', 'Manila', 'Monaco', 'Valletta', 'Stockholm', 'Asuncion', 'Riga']} countries = pd.DataFrame(country_data) unique_countries = countries['country'].unique() unique_continents = countries['continent'].unique() print(unique_countries) print(unique_continents)
To count the number of distinct values in a specific column, you can use the nunique()
method:
import pandas as pd country_data = {'country' : ['Thailand', 'Philippines', 'Monaco', 'Malta', 'Sweden', 'Paraguay', 'Latvia'], 'continent' : ['Asia', 'Asia', 'Europe', 'Europe', 'Europe', 'South America', 'Europe'], 'capital':['Bangkok', 'Manila', 'Monaco', 'Valletta', 'Stockholm', 'Asuncion', 'Riga']} countries = pd.DataFrame(country_data) print(countries['continent'].nunique())
Swipe to start coding
You are given a DataFrame
named audi_cars
.
- Identify all distinct values in the
'year'
column and store the result in theunique_years
column. - Identify all distinct values in the
'fueltype'
column and store the result in theunique_fueltype
variable. - Determine the number of unique fuel types and store the result in the
count_unique_fueltypes
variable.
Solution
Thanks for your feedback!
Unique Values
Data often gets duplicated in DataFrames. For instance, in the countries
DataFrame, the 'continent'
column has repeated entries. There's a method that retrieves an array of distinct values from a specific DataFrame column.
import pandas as pd country_data = {'country' : ['Thailand', 'Philippines', 'Monaco', 'Malta', 'Sweden', 'Paraguay', 'Latvia'], 'continent' : ['Asia', 'Asia', 'Europe', 'Europe', 'Europe', 'South America', 'Europe'], 'capital':['Bangkok', 'Manila', 'Monaco', 'Valletta', 'Stockholm', 'Asuncion', 'Riga']} countries = pd.DataFrame(country_data) print(countries)
Now, we'll apply the unique()
method to the 'continent'
and 'country'
columns:
import pandas as pd country_data = {'country' : ['Thailand', 'Philippines', 'Monaco', 'Malta', 'Sweden', 'Paraguay', 'Latvia'], 'continent' : ['Asia', 'Asia', 'Europe', 'Europe', 'Europe', 'South America', 'Europe'], 'capital':['Bangkok', 'Manila', 'Monaco', 'Valletta', 'Stockholm', 'Asuncion', 'Riga']} countries = pd.DataFrame(country_data) unique_countries = countries['country'].unique() unique_continents = countries['continent'].unique() print(unique_countries) print(unique_continents)
To count the number of distinct values in a specific column, you can use the nunique()
method:
import pandas as pd country_data = {'country' : ['Thailand', 'Philippines', 'Monaco', 'Malta', 'Sweden', 'Paraguay', 'Latvia'], 'continent' : ['Asia', 'Asia', 'Europe', 'Europe', 'Europe', 'South America', 'Europe'], 'capital':['Bangkok', 'Manila', 'Monaco', 'Valletta', 'Stockholm', 'Asuncion', 'Riga']} countries = pd.DataFrame(country_data) print(countries['continent'].nunique())
Swipe to start coding
You are given a DataFrame
named audi_cars
.
- Identify all distinct values in the
'year'
column and store the result in theunique_years
column. - Identify all distinct values in the
'fueltype'
column and store the result in theunique_fueltype
variable. - Determine the number of unique fuel types and store the result in the
count_unique_fueltypes
variable.
Solution
Thanks for your feedback!