Wrong Convertation?
Well, we have found out that the initial data contains dates in the format dd-mm-yyyy. The datetime
module reads dates in the yyyy-mm-dd format, which is fine. However, how will datetime
convert the date 2010-05-02
? Since we are familiar with the dataset, we know that it represents February 5, 2010. But datetime
doesn't know this and may consider this date as May 2, 2010. Let's check if this has happened in our data.
123456789101112131415# Loading the library import pandas as pd # Reading the data df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/72be5dde-f3e6-4c40-8881-e1d97ae31287/shops_data3.csv') # Displaying first four dates before converting print(df['Date'].head(4)) # Change column type df['Date'] = pd.to_datetime(df['Date']) # Displaying first four dates and dtypes of dataframe print(df['Date'].head(4)) print(df.dtypes)
Unfortunately, it has. Let's compare the first three dates in their initial and converted formats:
Initial date | Converted date |
---|---|
05-02-2010 | 2010-05-02 |
12-02-2010 | 2010-12-02 |
19-02-2010 | 2010-02-19 |
It is visible that for the first two dates, the left numbers were chosen as months, while for the third date, the left number was chosen as the day (the correct approach). Therefore, we need to make pandas
understand the format of the dates we are working with.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Ask me questions about this topic
Summarize this chapter
Show real-world examples
Awesome!
Completion rate improved to 3.45
Wrong Convertation?
Swipe to show menu
Well, we have found out that the initial data contains dates in the format dd-mm-yyyy. The datetime
module reads dates in the yyyy-mm-dd format, which is fine. However, how will datetime
convert the date 2010-05-02
? Since we are familiar with the dataset, we know that it represents February 5, 2010. But datetime
doesn't know this and may consider this date as May 2, 2010. Let's check if this has happened in our data.
123456789101112131415# Loading the library import pandas as pd # Reading the data df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/72be5dde-f3e6-4c40-8881-e1d97ae31287/shops_data3.csv') # Displaying first four dates before converting print(df['Date'].head(4)) # Change column type df['Date'] = pd.to_datetime(df['Date']) # Displaying first four dates and dtypes of dataframe print(df['Date'].head(4)) print(df.dtypes)
Unfortunately, it has. Let's compare the first three dates in their initial and converted formats:
Initial date | Converted date |
---|---|
05-02-2010 | 2010-05-02 |
12-02-2010 | 2010-12-02 |
19-02-2010 | 2010-02-19 |
It is visible that for the first two dates, the left numbers were chosen as months, while for the third date, the left number was chosen as the day (the correct approach). Therefore, we need to make pandas
understand the format of the dates we are working with.
Thanks for your feedback!