Grouping by Several Columns

Is it possible to group by pairs of values? For instance, we can group by countries and then by their regions. Yes, it's also possible in pandas! To group by several columns, use the same .groupby() method passing list of columns that will be used to determine groups. How does such a grouping work? Look at the picture below.

As you can see, at first values were grouped by 'Group' and then by 'Subgroup' among each of groups. For instance, let's find out number of households for each pair of 'roomh', 'hhsize' columns values (number of rooms and number of people in a dwelling, respectively).


              12345678
            
# Importing the library
import pandas as pd

# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data4.csv')

# Grouping and aggregating data
print(df.groupby(['roomh', 'hhsize']).size())

The output is quite big, since number of possible combinations is quite large. For instance, you can see that there are 59 dwellings with 10 or more rooms with 4 people living in it.

Tout était clair ?

Merci pour vos commentaires !

Section 3. Chapitre 6

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Contenu du cours

Data Manipulation using pandas

1. Preprocessing Data: Part I

What is Data Preprocessing?Types consistency Poor Data Presentation Manipulating Strings Challenge Replacing Specific Elements Simultaneous Replacement Challenge

2. Preprocessing Data: Part II

Logical Inconsistency Removing Rows Challenge Outliers Challenge Missing Values Filling NA values Challenge

3. Grouping Data

What is Grouping Data?Grouping in pandas [1/2]Challenge Grouping in pandas [2/2]Challenge Grouping by Several Columns Challenge

4. Aggregating and Visualizing Data

Advanced Aggregation [1/2]Challenge Advanced Aggregation [2/2]Challenge Histograms Challenge Bar and Scatter Plots Other Types of Graphs Challenge 1 Challenge 2

5. Joining Data

What is Joining Data?Left Join Right Join Inner Join Outer Join Concatenation

Grouping by Several Columns


              12345678
            
# Importing the library
import pandas as pd

# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data4.csv')

# Grouping and aggregating data
print(df.groupby(['roomh', 'hhsize']).size())

The output is quite big, since number of possible combinations is quite large. For instance, you can see that there are 59 dwellings with 10 or more rooms with 4 people living in it.

Tout était clair ?

Merci pour vos commentaires !

Section 3. Chapitre 6