Course Content

# Analyzing and Visualizing Real-World Data

2. Preprocessing Data: Part II

Analyzing and Visualizing Real-World Data

## Wrong Convertation?

Well, we have found out that the initial data contains dates in the format *dd-mm-yyyy*. The `datetime`

module reads dates in the *yyyy-mm-dd* format, which is fine. However, how will `datetime`

convert the date `2010-05-02`

? Since we are familiar with the dataset, we know that it represents *February 5, 2010*. But `datetime`

doesn't know this and may consider this date as *May 2, 2010*. Let's check if this has happened in our data.

Unfortunately, it has. Let's compare the first three dates in their initial and converted formats:

Initial date | Converted date |

05-02-2010 | 2010-05-02 |

12-02-2010 | 2010-12-02 |

19-02-2010 | 2010-02-19 |

It is visible that for the first two dates, the left numbers were chosen as months, while for the third date, the left number was chosen as the day (the correct approach). Therefore, we need to make `pandas`

understand the format of the dates we are working with.

Everything was clear?

Course Content

# Analyzing and Visualizing Real-World Data

2. Preprocessing Data: Part II

Analyzing and Visualizing Real-World Data

## Wrong Convertation?

Well, we have found out that the initial data contains dates in the format *dd-mm-yyyy*. The `datetime`

module reads dates in the *yyyy-mm-dd* format, which is fine. However, how will `datetime`

convert the date `2010-05-02`

? Since we are familiar with the dataset, we know that it represents *February 5, 2010*. But `datetime`

doesn't know this and may consider this date as *May 2, 2010*. Let's check if this has happened in our data.

Unfortunately, it has. Let's compare the first three dates in their initial and converted formats:

Initial date | Converted date |

05-02-2010 | 2010-05-02 |

12-02-2010 | 2010-12-02 |

19-02-2010 | 2010-02-19 |

It is visible that for the first two dates, the left numbers were chosen as months, while for the third date, the left number was chosen as the day (the correct approach). Therefore, we need to make `pandas`

understand the format of the dates we are working with.

Everything was clear?