Text Summarization with TF-ISF

## TF Score

Term Frequency (TF) is a measure that quantifies the importance of a word within a specific sentence or document, relative to the sentence or document's length. In essence, it's a way to highlight how frequently a word appears, adjusted for the size of the text to ensure fairness across texts of different lengths.

TF is calculated using a logarithmic scale to dampen the effect of very high frequencies, which helps maintain a balanced importance across all words. The formula used here is `log(1 + (frequency of the word in the sentence) / (total number of words in the sentence))`. This adjustment accounts for the intuition that the significance of a word to a sentence does not increase linearly with its frequency.

For each sentence in our list of tokenized sentences (`tokenized_sentences`), we calculate the TF score for every unique word. This is achieved by iterating through each word in a sentence, calculating its frequency relative to the sentence length, and applying the logarithmic formula. The result is a dictionary for each sentence, mapping words to their respective TF scores.

Calculate the term frequency (TF) of each word in each sentence.

Everything was clear?

Section 1. Chapter 7
AVAILABLE TO ULTIMATE ONLY

Course Content

# Text Summarization with TF-ISF

Text Summarization with TF-ISF

## TF Score

Term Frequency (TF) is a measure that quantifies the importance of a word within a specific sentence or document, relative to the sentence or document's length. In essence, it's a way to highlight how frequently a word appears, adjusted for the size of the text to ensure fairness across texts of different lengths.

TF is calculated using a logarithmic scale to dampen the effect of very high frequencies, which helps maintain a balanced importance across all words. The formula used here is `log(1 + (frequency of the word in the sentence) / (total number of words in the sentence))`. This adjustment accounts for the intuition that the significance of a word to a sentence does not increase linearly with its frequency.

For each sentence in our list of tokenized sentences (`tokenized_sentences`), we calculate the TF score for every unique word. This is achieved by iterating through each word in a sentence, calculating its frequency relative to the sentence length, and applying the logarithmic formula. The result is a dictionary for each sentence, mapping words to their respective TF scores.