Python for Data Science: Fake News Classification

Data Preprocessing

As a mandatory step in our analysis, we have to preprocess our data. Data preprocessing is the process of cleaning, transforming, and organizing the data in a way that makes it more suitable for analysis and modeling. This typically involves a series of steps, such as removing missing or duplicate values, correcting inconsistencies, and transforming the data into a format that is easier to work with.


  1. Remove unnecessary columns (to our analysis): "title", "subject","date";
  2. Check for duplicates;
  3. Check for null values;
  4. Shuffle the DataFrame and reset the index.

