Data Types
Let's talk about the types of data that dataframe may contain.
Numerical
Numerical data is presented by int or float values. In the dataframe, it should be stored as int64 or float64 data types value. Use data.info()
to check the data types for each column.
Note that some fields in the dataframe may contain numerical values, but are stored using some other data type (object or str). You have to convert it to the int64 or float64, and weβll explore how to do it later.
Categorical
Categorical data has no numerical representation, it is an item from the list of some groups or categories. For example, column Sex has values Male or Female, or column Season with values Spring, Summer, Fall, and Winter. It requires special conversion and preprocessing. This data has data types: object, bool, str.
Fortunately, the dataset titanic
already contains numerical data as int64
and float64
.
Swipe to start coding
Let's divide the columns into numerical and categorical. Create num_cols
as numpy array, including types int
and float
. Let the cat_cols
be all other features except the num_cols
.
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Summarize this chapter
Explain the code in file
Explain why file doesn't solve the task
Awesome!
Completion rate improved to 5.56
Data Types
Swipe to show menu
Let's talk about the types of data that dataframe may contain.
Numerical
Numerical data is presented by int or float values. In the dataframe, it should be stored as int64 or float64 data types value. Use data.info()
to check the data types for each column.
Note that some fields in the dataframe may contain numerical values, but are stored using some other data type (object or str). You have to convert it to the int64 or float64, and weβll explore how to do it later.
Categorical
Categorical data has no numerical representation, it is an item from the list of some groups or categories. For example, column Sex has values Male or Female, or column Season with values Spring, Summer, Fall, and Winter. It requires special conversion and preprocessing. This data has data types: object, bool, str.
Fortunately, the dataset titanic
already contains numerical data as int64
and float64
.
Swipe to start coding
Let's divide the columns into numerical and categorical. Create num_cols
as numpy array, including types int
and float
. Let the cat_cols
be all other features except the num_cols
.
Solution
Thanks for your feedback!
single