Course Content
Web Scraping with Python
BeautifulSoup
is a Python library that provides you with wide functionality to parse HTML pages. In the previous section, you worked with HTML as a string, which significantly limited us.
To install the BeatufulSoup
execute the pip install beautifulsoup4
line in the terminal or command prompt. To get started, import BeautifulSoup
from bs4
: from bs4 import BeautifulSoup
.
This library works with HTML files and can't parse links. Fortunately, you already know how to deal with that using urlopen
from urllib.requests
. To start parsing, you need to pass two parameters within the BeautifulSoup
function: the first is the HTML file, and the second is the parser (we will use the built-in html.parser
parser). This action will create a BeautifulSoup
object. For instance, let's open and read a web page.
The first method we will consider is the .prettify()
, which displays the html file as a nested data structure.
Section 2.
Chapter 1