Decoding the Truth: Fake News Classification Project
Clean and Convert
We have decided to create a complete chapter on the topic of text cleaning and preprocessing. As you may imagine, you won't be able to feed complete text into a Ml model. For that reason, we will handle it somehow. The first step will be to remove punctuation from our column to reduce the noise in our data. We will do that by using regex, regular expression matching operations.
Then we will vectorize our text. Look at the picture below for more info. Basically, we will represent words, sentences, or even larger units of text as vectors.
- Replace punctuaction with regex;
- Vectorize texts of the articles (it may take some seconds).
Everything was clear?