Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Wrong Convertation? | Preprocessing Data: Part II
Analyzing and Visualizing Real-World Data

bookWrong Convertation?

Well, we have found out that the initial data contains dates in the format dd-mm-yyyy. The datetime module reads dates in the yyyy-mm-dd format, which is fine. However, how will datetime convert the date 2010-05-02? Since we are familiar with the dataset, we know that it represents February 5, 2010. But datetime doesn't know this and may consider this date as May 2, 2010. Let's check if this has happened in our data.

123456789101112131415
# Loading the library import pandas as pd # Reading the data df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/72be5dde-f3e6-4c40-8881-e1d97ae31287/shops_data3.csv') # Displaying first four dates before converting print(df['Date'].head(4)) # Change column type df['Date'] = pd.to_datetime(df['Date']) # Displaying first four dates and dtypes of dataframe print(df['Date'].head(4)) print(df.dtypes)
copy

Unfortunately, it has. Let's compare the first three dates in their initial and converted formats:

Initial dateConverted date
05-02-20102010-05-02
12-02-20102010-12-02
19-02-20102010-02-19

It is visible that for the first two dates, the left numbers were chosen as months, while for the third date, the left number was chosen as the day (the correct approach). Therefore, we need to make pandas understand the format of the dates we are working with.

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 2. Kapitel 4

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Awesome!

Completion rate improved to 3.45

bookWrong Convertation?

Stryg for at vise menuen

Well, we have found out that the initial data contains dates in the format dd-mm-yyyy. The datetime module reads dates in the yyyy-mm-dd format, which is fine. However, how will datetime convert the date 2010-05-02? Since we are familiar with the dataset, we know that it represents February 5, 2010. But datetime doesn't know this and may consider this date as May 2, 2010. Let's check if this has happened in our data.

123456789101112131415
# Loading the library import pandas as pd # Reading the data df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/72be5dde-f3e6-4c40-8881-e1d97ae31287/shops_data3.csv') # Displaying first four dates before converting print(df['Date'].head(4)) # Change column type df['Date'] = pd.to_datetime(df['Date']) # Displaying first four dates and dtypes of dataframe print(df['Date'].head(4)) print(df.dtypes)
copy

Unfortunately, it has. Let's compare the first three dates in their initial and converted formats:

Initial dateConverted date
05-02-20102010-05-02
12-02-20102010-12-02
19-02-20102010-02-19

It is visible that for the first two dates, the left numbers were chosen as months, while for the third date, the left number was chosen as the day (the correct approach). Therefore, we need to make pandas understand the format of the dates we are working with.

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 2. Kapitel 4
some-alt