Python for Data Science: Text Summarizer
Create Sentence List
Now we will create a list of sentences of our story using the sent_tokenize function. sent_tokenize is a function in the Natural Language Toolkit (
NLTK) library in Python that is used to tokenize text into sentences. Given a piece of text as input, it will split the text into a list of sentences, where each sentence is an element of the list.
sent_tokenize() function uses an unsupervised machine learning algorithm, which means it does not need any labeled data to work. It uses punctuation marks, capitalization, and other heuristics to identify the boundaries between sentences.
- Use the
sent_tokenize()function to extract sentences from our story;
- Print the sentences.
Everything was clear?