Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Poor Data Presentation | Preprocessing Data: Part I
Data Manipulation using pandas

bookPoor Data Presentation

One of the reasons that can cause types inconsistency may be poor data presentation. For instance, values of weight column may have also the measurment unit (like, 25kg, 14lb). In this case Python will understand these values as strings.

Let's see what is wrong with the values in the columns we considered to have wrong type.

1234567
# Importing the library import pandas as pd # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data.csv') # Output values of 'problematic' columns print(df.loc[:,['totinch', 'morgh', 'valueh', 'grosrth', 'omphtotinch']])
copy

We found the root of the problem. All the columns but 'totinch' use dots . as indicator for missing values, while values in the 'totinch' column use commas , as the decimal separator. This may happen due to data origin, for instance. This problem can be solved by replacing commas with dots, and converting to float type.

Note, if you try to convert existing values into numeric type, then the error ValueError will be raised.

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 3

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Awesome!

Completion rate improved to 2.56

bookPoor Data Presentation

Свайпніть щоб показати меню

One of the reasons that can cause types inconsistency may be poor data presentation. For instance, values of weight column may have also the measurment unit (like, 25kg, 14lb). In this case Python will understand these values as strings.

Let's see what is wrong with the values in the columns we considered to have wrong type.

1234567
# Importing the library import pandas as pd # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data.csv') # Output values of 'problematic' columns print(df.loc[:,['totinch', 'morgh', 'valueh', 'grosrth', 'omphtotinch']])
copy

We found the root of the problem. All the columns but 'totinch' use dots . as indicator for missing values, while values in the 'totinch' column use commas , as the decimal separator. This may happen due to data origin, for instance. This problem can be solved by replacing commas with dots, and converting to float type.

Note, if you try to convert existing values into numeric type, then the error ValueError will be raised.

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 3
some-alt