Course Content
Web Scraping with Python
The methods considered in the previous chapters return pure html code parts. BeautifulSoup
allows us to get the attributes and contents of specific elements. To get attributes of some object, use the .attrs
attribute. For instance, we can get the attributes of the first <div>
element.
Note that result of applying the .attrs
attribute is a dictionary with keys being names of attributes and values being their respective values. If you want to get the content stored in a tag, use the .contents
attribute. For instance, let's see what's inside the first <div>
element.
Above, you can see that all the newline characters were included in a list of elements. Not the best way to represent the content. If you want to get only the text within a specific element, apply the .get_text()
method. Compare the results from the example below, and the one got above.
Section 3.
Chapter 1