Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer What is Data Preprocessing? | Preprocessing Data: Part I
Data Manipulation using pandas
course content

Cursusinhoud

Data Manipulation using pandas

Data Manipulation using pandas

1. Preprocessing Data: Part I
2. Preprocessing Data: Part II
3. Grouping Data
4. Aggregating and Visualizing Data
5. Joining Data

book
What is Data Preprocessing?

As a data analyst, most likely you will deal with 'dirty' data. What issues can happen while working with gathered data?

  • Missing values

  • Wrong data types

  • Outliers

  • Other inconsistency

Within the course, you will be aquintated how to detect and fix common issues. Let's start with the dataset that you will use throughout the course. It will be the Statvillage dataset containing data on a hypothetical village in Canada. There are more than 40 columns in the dataset, but we will stop only on about 10 of them.

The village consists of 128 blocks of 8 houses each. First, let's read the data and see how does it look like.

123456
# Importing the library import pandas as pd # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data.csv') print(df)
copy

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 1

Vraag AI

expand
ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

course content

Cursusinhoud

Data Manipulation using pandas

Data Manipulation using pandas

1. Preprocessing Data: Part I
2. Preprocessing Data: Part II
3. Grouping Data
4. Aggregating and Visualizing Data
5. Joining Data

book
What is Data Preprocessing?

As a data analyst, most likely you will deal with 'dirty' data. What issues can happen while working with gathered data?

  • Missing values

  • Wrong data types

  • Outliers

  • Other inconsistency

Within the course, you will be aquintated how to detect and fix common issues. Let's start with the dataset that you will use throughout the course. It will be the Statvillage dataset containing data on a hypothetical village in Canada. There are more than 40 columns in the dataset, but we will stop only on about 10 of them.

The village consists of 128 blocks of 8 houses each. First, let's read the data and see how does it look like.

123456
# Importing the library import pandas as pd # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data.csv') print(df)
copy

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 1
Onze excuses dat er iets mis is gegaan. Wat is er gebeurd?
some-alt