Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Working with Specific Elements | Beautiful Soup: Part I
Web Scraping with Python
course content

Conteúdo do Curso

Web Scraping with Python

Web Scraping with Python

1. Getting Acquainted with HTML
2. Beautiful Soup: Part I
3. Beautiful Soup: Part II

bookWorking with Specific Elements

Navigating an HTML document using Python attributes will retrieve only the first occurrence of a particular element. But what if you're interested in the first instance of an element and don't know its full path? In such cases, you can utilize the .find() method, passing the tag (without < > brackets) as a string. For example, let's locate the first <div> element in the HTML document.

123456789101112
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, 'html.parser') print(soup.find('div'))
copy

Furthermore, you can retrieve all instances of a specific element by employing the .find_all() method. This will yield a list of instances. For instance, let's locate all the <p> tags in the HTML document.

123456789101112
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, 'html.parser') print(soup.find_all('p'))
copy

You can also use the .find_all() method to find not just one but multiple tags by providing a list of tags. For example, let's gather all the <div> and <title> elements.

12345678910111213
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, 'html.parser') for el in soup.find_all(["div", "title"]): print(el)
copy

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 5
some-alt