course content

Course Content

Web Scraping with Python

What is Beautiful Soup?What is Beautiful Soup?

BeautifulSoup is a Python library that provides you with wide functionality to parse HTML pages. In the previous section, you worked with HTML as a string, which significantly limited us.

To install the BeatufulSoup execute the pip install beautifulsoup4 line in the terminal or command prompt. To get started, import BeautifulSoup from bs4: from bs4 import BeautifulSoup.

This library works with HTML files and can't parse links. Fortunately, you already know how to deal with that using urlopen from urllib.requests. To start parsing, you need to pass two parameters within the BeautifulSoup function: the first is the HTML file, and the second is the parser (we will use the built-in html.parser parser). This action will create a BeautifulSoup object. For instance, let's open and read a web page.

The first method we will consider is the .prettify(), which displays the html file as a nested data structure.

Section 2.

Chapter 1