Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Navigating HTML Document | Decoding HTML with Beautiful Soup
Web Scraping with Python
course content

Зміст курсу

Web Scraping with Python

Web Scraping with Python

1. Getting Acquainted with HTML
2. Decoding HTML with Beautiful Soup
3. Working with Element Attributes in Beautiful Soup

book
Navigating HTML Document

After reading the HTML document, you have the flexibility to navigate it in several ways. To delve deeper, you can specify a tag just like an attribute. For example, let's examine the <head> element and represent it in a 'structured' form (by employing the .prettify() method).

123456789101112
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") print(soup.head.prettify())
copy

Feel free to experiment by substituting the .head attribute with .body, for example. As shown above, the <head> element encompasses several children. You can iterate through all the children of elements using a for loop and the .children attribute.

1234567891011121314
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") # Iterating over all element children for child in soup.head.children: print(child)
copy

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 2
We're sorry to hear that something went wrong. What happened?
some-alt