Course Content
Introduction to Neural Networks
Introduction to Neural Networks
Data Preprocessing
Wine Dataset
Now we will try to train our model on more realistic task. There is a wine dataset in scikit-learn
library that we will use to predict wine class. We will use 3 input parameters for prediction.
Here you can see how this dataset look like:
import pandas as pd # Import pandas to create a DataFrame from loaded dataset from sklearn.datasets import load_wine # Import dataset loading function wine_ds = load_wine() # Load the dataset X = pd.DataFrame(wine_ds.data, columns=wine_ds.feature_names)[['flavanoids', 'proline', 'total_phenols']] # Extract input values from the dataset y = pd.DataFrame(wine_ds.target, columns=['target']) # Extract output values from the dataset # Display the datasets display(X.head()) # `X` is our input values, they are used to predict target value display(pd.DataFrame(y.value_counts())) # `y` is a target value, that we want to predict; it has 3 target classes
To train our model, we'll use three input parameters: flavanoids
, proline
, and total_phenols
. For now, we have chosen these parameters as one of those with the highest correlation. This is done in order to reduce the size of the neural network required for successful training and reduce the time spent on the training process.
Data Preprocessing
Here's how we'll prepare the data for training:
-
Data Scaling: Neural networks differ from decision trees or random forests in that they require data scaling for better performance. This step is crucial for reasons such as ensuring numerical stability, achieving faster convergence, and ensuring unit independence, etc. Always scale your data before passing it trough a neural network;
-
One-Hot Encoding: Our target values comprise three classes, represented in a single column by the numbers
0
,1
, and2
. For enhanced neural network performance, it's more effective to encode these classes into three distinct columns; -
Train-Test Data Split: Using the same dataset for both training and testing won't give us a realistic measure of the model's performance on new, unseen data.
Swipe to show code editor
Prepare the wine dataset to work with our neural network:
- Extract input values from the dataset.
- Scale input values.
- Split data into train and test sets (40% of data will be used as test data).
Solution
Thanks for your feedback!
Data Preprocessing
Wine Dataset
Now we will try to train our model on more realistic task. There is a wine dataset in scikit-learn
library that we will use to predict wine class. We will use 3 input parameters for prediction.
Here you can see how this dataset look like:
import pandas as pd # Import pandas to create a DataFrame from loaded dataset from sklearn.datasets import load_wine # Import dataset loading function wine_ds = load_wine() # Load the dataset X = pd.DataFrame(wine_ds.data, columns=wine_ds.feature_names)[['flavanoids', 'proline', 'total_phenols']] # Extract input values from the dataset y = pd.DataFrame(wine_ds.target, columns=['target']) # Extract output values from the dataset # Display the datasets display(X.head()) # `X` is our input values, they are used to predict target value display(pd.DataFrame(y.value_counts())) # `y` is a target value, that we want to predict; it has 3 target classes
To train our model, we'll use three input parameters: flavanoids
, proline
, and total_phenols
. For now, we have chosen these parameters as one of those with the highest correlation. This is done in order to reduce the size of the neural network required for successful training and reduce the time spent on the training process.
Data Preprocessing
Here's how we'll prepare the data for training:
-
Data Scaling: Neural networks differ from decision trees or random forests in that they require data scaling for better performance. This step is crucial for reasons such as ensuring numerical stability, achieving faster convergence, and ensuring unit independence, etc. Always scale your data before passing it trough a neural network;
-
One-Hot Encoding: Our target values comprise three classes, represented in a single column by the numbers
0
,1
, and2
. For enhanced neural network performance, it's more effective to encode these classes into three distinct columns; -
Train-Test Data Split: Using the same dataset for both training and testing won't give us a realistic measure of the model's performance on new, unseen data.
Swipe to show code editor
Prepare the wine dataset to work with our neural network:
- Extract input values from the dataset.
- Scale input values.
- Split data into train and test sets (40% of data will be used as test data).
Solution
Thanks for your feedback!