Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Data Types | Data Exploration
Preprocessing Data
course content

Kursinnhold

Preprocessing Data

Preprocessing Data

1. Data Exploration
2. Data Cleaning
3. Data Validation
4. Normalization & Standardization
5. Data Encoding

book
Data Types

Let's talk about the types of data that dataframe may contain.

Numerical

Numerical data is presented by int or float values. In the dataframe, it should be stored as int64 or float64 data types value. Use data.info() to check the data types for each column.

Note that some fields in the dataframe may contain numerical values, but are stored using some other data type (object or str). You have to convert it to the int64 or float64, and we’ll explore how to do it later.

Categorical

Categorical data has no numerical representation, it is an item from the list of some groups or categories. For example, column Sex has values Male or Female, or column Season with values Spring, Summer, Fall, and Winter. It requires special conversion and preprocessing. This data has data types: object, bool, str.

Fortunately, the dataset titanic already contains numerical data as int64 and float64.

Oppgave

Swipe to start coding

Let's divide the columns into numerical and categorical. Create num_cols as numpy array, including types int and float. Let the cat_cols be all other features except the num_cols.

Løsning

Switch to desktopBytt til skrivebordet for virkelighetspraksisFortsett der du er med et av alternativene nedenfor
Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 3
toggle bottom row

book
Data Types

Let's talk about the types of data that dataframe may contain.

Numerical

Numerical data is presented by int or float values. In the dataframe, it should be stored as int64 or float64 data types value. Use data.info() to check the data types for each column.

Note that some fields in the dataframe may contain numerical values, but are stored using some other data type (object or str). You have to convert it to the int64 or float64, and we’ll explore how to do it later.

Categorical

Categorical data has no numerical representation, it is an item from the list of some groups or categories. For example, column Sex has values Male or Female, or column Season with values Spring, Summer, Fall, and Winter. It requires special conversion and preprocessing. This data has data types: object, bool, str.

Fortunately, the dataset titanic already contains numerical data as int64 and float64.

Oppgave

Swipe to start coding

Let's divide the columns into numerical and categorical. Create num_cols as numpy array, including types int and float. Let the cat_cols be all other features except the num_cols.

Løsning

Switch to desktopBytt til skrivebordet for virkelighetspraksisFortsett der du er med et av alternativene nedenfor
Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 3
Switch to desktopBytt til skrivebordet for virkelighetspraksisFortsett der du er med et av alternativene nedenfor
Vi beklager at noe gikk galt. Hva skjedde?
some-alt