Clean and Convert

We have decided to create a complete chapter on the topic of text cleaning and preprocessing. As you may imagine, complete texts cannot be directly fed into an ML model. For this reason, we will apply specific preprocessing techniques.

The first step will be to remove punctuation from our column to reduce noise in our data. We will do this using regex (regular expression matching operations).

Then we will vectorize our text. Refer to the picture below for more information. Essentially, we will represent words, sentences, or even larger units of text as vectors.

Compito

Swipe to start coding

Remove punctuaction with regex by using the appropriate method to replace the given pattern with an empty string.
Vectorize the texts of the articles.

Soluzione

Mark tasks as Completed

Cambia al desktop per esercitarti nel mondo realeContinua da dove ti trovi utilizzando una delle opzioni seguenti

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 1. Capitolo 4

Chieda ad AI

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Contenuti del Corso

Identifying Fake News

Introduction True News and Fake News Data Preprocessing Clean and Convert Initial Model Fit Decision Tree Comparison Fake News Tool (Bonus)