What is Data Preprocessing?
As a data analyst, most likely you will deal with 'dirty' data. What issues can happen while working with gathered data?
- Missing values
- Wrong data types
- Outliers
- Other inconsistency
Within the course, you will be aquintated how to detect and fix common issues. Let's start with the dataset that you will use throughout the course. It will be the Statvillage dataset containing data on a hypothetical village in Canada. There are more than 40 columns in the dataset, but we will stop only on about 10 of them.
The village consists of 128 blocks of 8 houses each. First, let's read the data and see how does it look like.
123456# Importing the library import pandas as pd # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data.csv') print(df)
Grazie per i tuoi commenti!
Chieda ad AI
Chieda ad AI
Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione
Awesome!
Completion rate improved to 2.56
What is Data Preprocessing?
Scorri per mostrare il menu
As a data analyst, most likely you will deal with 'dirty' data. What issues can happen while working with gathered data?
- Missing values
- Wrong data types
- Outliers
- Other inconsistency
Within the course, you will be aquintated how to detect and fix common issues. Let's start with the dataset that you will use throughout the course. It will be the Statvillage dataset containing data on a hypothetical village in Canada. There are more than 40 columns in the dataset, but we will stop only on about 10 of them.
The village consists of 128 blocks of 8 houses each. First, let's read the data and see how does it look like.
123456# Importing the library import pandas as pd # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data.csv') print(df)
Grazie per i tuoi commenti!