Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele TF Score | Text Summarization with TF-ISF
Extracting Text Meaning using TF-IDF

book
TF Score

Term Frequency (TF) is a measure that quantifies the importance of a word within a specific sentence or document, relative to the sentence or document's length. In essence, it's a way to highlight how frequently a word appears, adjusted for the size of the text to ensure fairness across texts of different lengths.

TF is calculated using a logarithmic scale to dampen the effect of very high frequencies, which helps maintain a balanced importance across all words. The formula used here is log(1 + (frequency of the word in the sentence) / (total number of words in the sentence)). This adjustment accounts for the intuition that the significance of a word to a sentence does not increase linearly with its frequency.

For each sentence in our list of tokenized sentences (tokenized_sentences), we calculate the TF score for every unique word. This is achieved by iterating through each word in a sentence, calculating its frequency relative to the sentence length, and applying the logarithmic formula. The result is a dictionary for each sentence, mapping words to their respective TF scores.

Tehtävä

Swipe to start coding

Calculate the term frequency (TF) of each word in each sentence.

Ratkaisu

# Importing the math module
import math

# Calculate the frequency of each word in each sentence (TF)
tf_scores = [{word: math.log(1 + sentence.count(word) / len(sentence)) for word in set(sentence)}
for sentence in tokenized_sentences]

# Display TF for each word for the first two sentences
tf_scores[:2]

Mark tasks as Completed
Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 1. Luku 7

Kysy tekoälyä

expand
ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

some-alt