Conteúdo do Curso
Web Scraping with Python
Web Scraping with Python
What is Beautiful Soup?
BeautifulSoup
is a Python library that offers extensive functionality for parsing HTML
pages. In the previous section, you worked with HTML
as a string, which imposed significant limitations.
To install BeautifulSoup
, execute the following command in your terminal or command prompt:
pip install beautifulsoup4
;- To get started, import BeautifulSoup from bs4:
from bs4 import BeautifulSoup
.
This library is designed for working with HTML
files and does not handle links. However, you already know how to deal with that using urlopen from urllib.requests
. To initiate parsing, you need to provide two parameters to the BeautifulSoup
function: the first is the HTML
file, and the second is the parser (we will use the built-in html.parser
parser). This action will create a BeautifulSoup object. For example, let's open and read a web page.
The first method we will explore is .prettify()
, which presents the HTML
file as a nested data structure.
Tudo estava claro?
Conteúdo do Curso
Web Scraping with Python
Web Scraping with Python
What is Beautiful Soup?
BeautifulSoup
is a Python library that offers extensive functionality for parsing HTML
pages. In the previous section, you worked with HTML
as a string, which imposed significant limitations.
To install BeautifulSoup
, execute the following command in your terminal or command prompt:
pip install beautifulsoup4
;- To get started, import BeautifulSoup from bs4:
from bs4 import BeautifulSoup
.
This library is designed for working with HTML
files and does not handle links. However, you already know how to deal with that using urlopen from urllib.requests
. To initiate parsing, you need to provide two parameters to the BeautifulSoup
function: the first is the HTML
file, and the second is the parser (we will use the built-in html.parser
parser). This action will create a BeautifulSoup object. For example, let's open and read a web page.
The first method we will explore is .prettify()
, which presents the HTML
file as a nested data structure.
Tudo estava claro?