Continue exploring `BeautifulSoup` let’s learn some important functions! We can extract not only tag but also their parts (for example, names or attributes):

print(soup.div.name)
print(soup.div.attrs)

In the code, we used the method `.name` to get the tag’s name and the function `.attrs`, which returns all tag attributes as a dictionary.

Another useful function is  `.get_text()`, which extracts all the raw text from the website without HTML tags.

The output of the page will contain a lot of extra blank lines. It happened because of newline characters in the initial HTML file. 

In a similar way you can also get only text in the extracted HTML tags using the function `.get_text()` or `.string`:

print(soup.h1.string)
print(soup.h1.get_text())

If a tag contains more than one thing (or nothing), it is unclear what `.string` should refer to, so the function returns `None`.

Web Scraping is a process that can be used to automatically extract information from websites. This course will help you to get the data from sources and transform it to the DataFrames using various libraries!

Here we will explore the structure of the HTML file, how to load it and work with the data in the file.

Learn how to extract the data more comfortably with the BeautifulSoup library.

Explore how XPath and CSS Selectors can be used to locate web elements without id, class, or name.

Here we will learn how to convert tables into the DataFrame for the following analysis.

Work with Soup

Lösung