Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Motivation Analysis | Exploratory Data Analysis of Nobel Prizes
Conducting Exploratory Data Analysis of Nobel Prizes

book
Motivation Analysis

In this section, our focus will be on examining the text to identify the most prevalent words in our dataset. Initially, we will eliminate all stopwords from the "motivation" column and modify our data accordingly.

Take, for instance, the sentence: "I like reading, so I read." It will be altered to: "Like Reading Read." Following this transformation, we will visualize these words in a word cloud, where the size of each word reflects its frequency in our dataset.

Uppgift

Swipe to start coding

  1. Apply a lambda function to remove stopwords from the 'motivation' column and store the processed text in the 'Filtered motivation' column.

  2. Concatenate all entries in the "Filtered motivation" column to form a single text string.

  3. Split the concatenated text into individual words and create a pandas DataFrame from the list of words.

  4. Calculate word frequency by counting occurrences of each word.

  5. Plot the 20 most common words using seaborn's barplot.

Lösning

import nltk
nltk.download("stopwords")
from nltk.corpus import stopwords
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Initialize the 'Filtered motivation' column with empty strings
nobel["Filtered motivation"] = ""

# Apply a lambda function to remove stopwords from the 'motivation' column and store the result in 'Filtered motivation'
nobel["Filtered motivation"] = nobel["motivation"].apply(
lambda x: " ".join(word for word in x.split() if word.lower() not in stopwords.words("english"))
)

# Concatenate all entries in 'Filtered motivation' to form a single text string
text = " ".join(nobel["Filtered motivation"])

# Split the text into words and create a pandas DataFrame from the list of words
words_df = pd.DataFrame(text.split(), columns=['word'])

# Calculate word frequency
word_freq = words_df['word'].value_counts().reset_index()
word_freq.columns = ['word', 'freq']

# Plotting the 20 most common words using seaborn's barplot
plt.figure(figsize=(10, 8))
sns.barplot(x='freq', y='word', data=word_freq.head(20), palette='viridis')
plt.title('Top 20 Most Common Words')
plt.xlabel('Frequency')
plt.ylabel('Word')
plt.show()

Mark tasks as Completed
Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 5

Fråga AI

expand
ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

some-alt