As a data analyst, most likely you will deal with 'dirty' data. What issues can happen while working with gathered data?

- Missing values
- Wrong data types
- Outliers
- Other inconsistency

Within the course, you will be aquintated how to detect and fix common issues. Let's start with the dataset that you will use throughout the course. It will be the [Statvillage](http://jse.amstat.org/v5n2/schwarz.supp/index.html) dataset containing data on a hypothetical village in Canada. There are more than 40 columns in the dataset, but we will stop only on about 10 of them.

The village consists of 128 blocks of 8 houses each. First, let's read the data and see how does it look like.

# Importing the library
import pandas as pd

# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data.csv')
print(df)

This course covers intermediate topics on pandas, a must-have tool for each data analyst. During the course, you will learn how to prepare data for further interactions and how to group it using different techniques. You will learn the easiest data visualization and be acquainted with data joining.

The data received from different sources can be messy, and to use it in the future, you must ensure it is convenient. In the first section, you will learn what data preprocessing is and will deal with some logical inconsistencies.

Missing or NA values, outliers, and inconsistencies are other types of problematic data. Throughout the second section of the course, you will learn how to deal with such issues.

As a data analyst, you will need to draw compact conclusions based on large amounts of data. In order to achieve that, you need to understand the data grouping idea and how to apply it to examples.

Sometimes one built-in function is not enough to draw a complete conclusion, so you need to use something more complex. This section will teach you how to apply multiple functions while grouping. Also, you will learn to visualize data using the pandas library only and will be acquainted with the main plot types. As a data analyst, you will need to draw compact conclusions based on large amounts of data. In order to achieve that, you need to understand the data grouping idea and how to apply it to examples.

As mentioned before, sometimes you may need to work with data received from multiple sources. This section will teach you how to join two dataframes using different techniques.

What is Data Preprocessing?