Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Attributes and Contents of Element | Working with Element Attributes in Beautiful Soup
Web Scraping with Python

bookAttributes and Contents of Element

The methods covered earlier return specific parts of the HTML code. BeautifulSoup also allows you to access the attributes and contents of particular elements. To get an element’s attributes, use the .attrs attribute. For example, retrieve the attributes of the first <div> element.

123456789101112
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") print(soup.find("div").attrs)
copy

The result of using the .attrs attribute is a dictionary where the keys are attribute names and the values are their corresponding values. To get the content inside a tag, use the .contents attribute. For example, check the contents of the first <div> element.

123456789101112
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") print(soup.find("div").contents)
copy

As observed above, all the newline characters were included in a list of elements, which may not be the most desirable representation of content. If you want to extract only the text within a specific element, utilize the .get_text() method. Compare the results from the example below with the one obtained earlier.

123456789101112
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") print(soup.find("div").get_text())
copy
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 1

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

What is the difference between `.attrs`, `.contents`, and `.get_text()` in BeautifulSoup?

Can you explain why `.get_text()` is preferred for extracting text content?

How can I extract attributes and text from other HTML elements, not just `<div>`?

Awesome!

Completion rate improved to 4.35

bookAttributes and Contents of Element

Swipe to show menu

The methods covered earlier return specific parts of the HTML code. BeautifulSoup also allows you to access the attributes and contents of particular elements. To get an element’s attributes, use the .attrs attribute. For example, retrieve the attributes of the first <div> element.

123456789101112
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") print(soup.find("div").attrs)
copy

The result of using the .attrs attribute is a dictionary where the keys are attribute names and the values are their corresponding values. To get the content inside a tag, use the .contents attribute. For example, check the contents of the first <div> element.

123456789101112
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") print(soup.find("div").contents)
copy

As observed above, all the newline characters were included in a list of elements, which may not be the most desirable representation of content. If you want to extract only the text within a specific element, utilize the .get_text() method. Compare the results from the example below with the one obtained earlier.

123456789101112
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") print(soup.find("div").get_text())
copy
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 1
some-alt