Work with Soup
Continue exploring BeautifulSoup
let’s learn some important functions! We can extract not only tag but also their parts (for example, names or attributes):
print(soup.div.name) print(soup.div.attrs)
In the code, we used the method .name
to get the tag’s name and the function .attrs
, which returns all tag attributes as a dictionary.
Another useful function is .get_text()
, which extracts all the raw text from the website without HTML tags.
python
The output of the page will contain a lot of extra blank lines. It happened because of newline characters in the initial HTML file.
In a similar way you can also get only text in the extracted HTML tags using the function .get_text()
or .string
:
print(soup.h1.string) print(soup.h1.get_text())
If a tag contains more than one thing (or nothing), it is unclear what .string
should refer to, so the function returns None
.
Swipe to start coding
Here you will work on the same page about Christ the Redeemer as in the previous task.
- Import the
BeautifulSoup
library. - Print the attributes of the
p
tag. - Print only the text of the
ul
tags.
Solution
Merci pour vos commentaires !