Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Data Preprocessing | Neural Network from Scratch
Introduction to Neural Networks
course content

Course Content

Introduction to Neural Networks

Introduction to Neural Networks

1. Concept of Neural Network
2. Neural Network from Scratch
3. Conclusion

book
Data Preprocessing

Wine Dataset

Now we will try to train our model on more realistic task. There is a wine dataset in scikit-learn library that we will use to predict wine class. We will use 3 input parameters for prediction.

Here you can see how this dataset look like:

12345678910
import pandas as pd # Import pandas to create a DataFrame from loaded dataset from sklearn.datasets import load_wine # Import dataset loading function wine_ds = load_wine() # Load the dataset X = pd.DataFrame(wine_ds.data, columns=wine_ds.feature_names)[['flavanoids', 'proline', 'total_phenols']] # Extract input values from the dataset y = pd.DataFrame(wine_ds.target, columns=['target']) # Extract output values from the dataset # Display the datasets display(X.head()) # `X` is our input values, they are used to predict target value display(pd.DataFrame(y.value_counts())) # `y` is a target value, that we want to predict; it has 3 target classes
copy

To train our model, we'll use three input parameters: flavanoids, proline, and total_phenols. For now, we have chosen these parameters as one of those with the highest correlation. This is done in order to reduce the size of the neural network required for successful training and reduce the time spent on the training process.

Data Preprocessing

Here's how we'll prepare the data for training:

  1. Data Scaling: Neural networks differ from decision trees or random forests in that they require data scaling for better performance. This step is crucial for reasons such as ensuring numerical stability, achieving faster convergence, and ensuring unit independence, etc. Always scale your data before passing it trough a neural network;

  2. One-Hot Encoding: Our target values comprise three classes, represented in a single column by the numbers 0, 1, and 2. For enhanced neural network performance, it's more effective to encode these classes into three distinct columns;

  3. Train-Test Data Split: Using the same dataset for both training and testing won't give us a realistic measure of the model's performance on new, unseen data.

Task
test

Swipe to show code editor

Prepare the wine dataset to work with our neural network:

  1. Extract input values from the dataset.
  2. Scale input values.
  3. Split data into train and test sets (40% of data will be used as test data).

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 5
toggle bottom row

book
Data Preprocessing

Wine Dataset

Now we will try to train our model on more realistic task. There is a wine dataset in scikit-learn library that we will use to predict wine class. We will use 3 input parameters for prediction.

Here you can see how this dataset look like:

12345678910
import pandas as pd # Import pandas to create a DataFrame from loaded dataset from sklearn.datasets import load_wine # Import dataset loading function wine_ds = load_wine() # Load the dataset X = pd.DataFrame(wine_ds.data, columns=wine_ds.feature_names)[['flavanoids', 'proline', 'total_phenols']] # Extract input values from the dataset y = pd.DataFrame(wine_ds.target, columns=['target']) # Extract output values from the dataset # Display the datasets display(X.head()) # `X` is our input values, they are used to predict target value display(pd.DataFrame(y.value_counts())) # `y` is a target value, that we want to predict; it has 3 target classes
copy

To train our model, we'll use three input parameters: flavanoids, proline, and total_phenols. For now, we have chosen these parameters as one of those with the highest correlation. This is done in order to reduce the size of the neural network required for successful training and reduce the time spent on the training process.

Data Preprocessing

Here's how we'll prepare the data for training:

  1. Data Scaling: Neural networks differ from decision trees or random forests in that they require data scaling for better performance. This step is crucial for reasons such as ensuring numerical stability, achieving faster convergence, and ensuring unit independence, etc. Always scale your data before passing it trough a neural network;

  2. One-Hot Encoding: Our target values comprise three classes, represented in a single column by the numbers 0, 1, and 2. For enhanced neural network performance, it's more effective to encode these classes into three distinct columns;

  3. Train-Test Data Split: Using the same dataset for both training and testing won't give us a realistic measure of the model's performance on new, unseen data.

Task
test

Swipe to show code editor

Prepare the wine dataset to work with our neural network:

  1. Extract input values from the dataset.
  2. Scale input values.
  3. Split data into train and test sets (40% of data will be used as test data).

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 5
Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt