Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Advanced Search | Working with Element Attributes in Beautiful Soup
Web Scraping with Python

bookAdvanced Search

Some HTML tags require mandatory attributes, such as the anchor tag needing the href attribute or the <img> tag requiring the src attribute. To access a specific attribute, use the .get() method after .attrs. For example, retrieve all src attributes from all <img> elements.

12345678910111213
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") for img in soup.find_all("img"): print(img.attrs.get("src"))
copy

You may also come across the id attribute, which is commonly used to distinguish elements with the same tag. To search for elements with specific attribute values, pass them as a dictionary in the format attr_name: attr_value to the .find_all() method, right after specifying the tag. For example, find all <div> elements with the class attribute set to "box" or the <p> element with the "id" attribute value "id2".

12345678910111213141516
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") for div in soup.find_all("div", {"class": "box"}): print(div) # Filtering by id attribute value print(soup.find("p", {"id": "id2"}))
copy

The .find() method is used instead of .find_all() to get an element by its id, as an id is a unique identifier and cannot appear more than once. To confirm that only specific <div> elements were retrieved, check the classes assigned to the <div> elements.

12345678910111213
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") for div in soup.find_all("div"): print(div.attrs.get("class"))
copy
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 5

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 4.35

bookAdvanced Search

Swipe to show menu

Some HTML tags require mandatory attributes, such as the anchor tag needing the href attribute or the <img> tag requiring the src attribute. To access a specific attribute, use the .get() method after .attrs. For example, retrieve all src attributes from all <img> elements.

12345678910111213
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") for img in soup.find_all("img"): print(img.attrs.get("src"))
copy

You may also come across the id attribute, which is commonly used to distinguish elements with the same tag. To search for elements with specific attribute values, pass them as a dictionary in the format attr_name: attr_value to the .find_all() method, right after specifying the tag. For example, find all <div> elements with the class attribute set to "box" or the <p> element with the "id" attribute value "id2".

12345678910111213141516
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") for div in soup.find_all("div", {"class": "box"}): print(div) # Filtering by id attribute value print(soup.find("p", {"id": "id2"}))
copy

The .find() method is used instead of .find_all() to get an element by its id, as an id is a unique identifier and cannot appear more than once. To confirm that only specific <div> elements were retrieved, check the classes assigned to the <div> elements.

12345678910111213
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") for div in soup.find_all("div"): print(div.attrs.get("class"))
copy
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 5
some-alt