Web Scraping with Python
The methods considered in the previous chapters return pure html code parts.
BeautifulSoup allows us to get the attributes and contents of specific elements. To get attributes of some object, use the
.attrs attribute. For instance, we can get the attributes of the first
Note that result of applying the
.attrs attribute is a dictionary with keys being names of attributes and values being their respective values. If you want to get the content stored in a tag, use the
.contents attribute. For instance, let's see what's inside the first
Above, you can see that all the newline characters were included in a list of elements. Not the best way to represent the content. If you want to get only the text within a specific element, apply the
.get_text() method. Compare the results from the example below, and the one got above.