Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Advanced Search | Working with Element Attributes in Beautiful Soup
Web Scraping with Python

book
Advanced Search

Certain HTML tags require mandatory attributes, such as the anchor tag necessitating the href attribute or <img> requiring the src attribute. If you are interested in a specific attribute, you can use the .get() method following .attrs. For example, let's retrieve all the src attributes of all <img> elements.

# Importing libraries
from bs4 import BeautifulSoup
from urllib.request import urlopen

# Reading web page
url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html"
page = urlopen(url)
html = page.read().decode("utf-8")

# Reading HTML with BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
for img in soup.find_all("img"):
print(img.attrs.get("src"))
12345678910111213
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") for img in soup.find_all("img"): print(img.attrs.get("src"))
copy

You may also encounter the id attribute, which is quite common and is used to distinguish elements under the same tag. If you are interested in specific attribute values, you can pass them as a dictionary (in the format attr_name: attr_value) as the parameter for .find_all() (immediately after specifying the tag you are searching for). For example, we are interested in only <div> elements with the class attribute set to "box", or we are searching for the <p> element with an "id" attribute value of "id2".

# Importing libraries
from bs4 import BeautifulSoup
from urllib.request import urlopen

# Reading web page
url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html"
page = urlopen(url)
html = page.read().decode("utf-8")

# Reading HTML with BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
for div in soup.find_all("div", {"class": "box"}):
print(div)

# Filtering by id attribute value
print(soup.find("p", {"id": "id2"}))
12345678910111213141516
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") for div in soup.find_all("div", {"class": "box"}): print(div) # Filtering by id attribute value print(soup.find("p", {"id": "id2"}))
copy

We utilized the .find() method (instead of .find_all()) to retrieve the element with a specific id since the id is a unique identifier, and there cannot be more than one element with the same value. To ensure that we obtained only specific <div> elements, let's examine the classes that <div> elements have.

# Importing libraries
from bs4 import BeautifulSoup
from urllib.request import urlopen

# Reading web page
url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html"
page = urlopen(url)
html = page.read().decode("utf-8")

# Reading HTML with BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
for div in soup.find_all("div"):
print(div.attrs.get("class"))
12345678910111213
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") for div in soup.find_all("div"): print(div.attrs.get("class"))
copy

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 5
some-alt