Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Applying String Methods | Getting Acquainted with HTML
Web Scraping with Python
course content

Course Content

Web Scraping with Python

Web Scraping with Python

1. Getting Acquainted with HTML
2. Beautiful Soup: Part I
3. Beautiful Soup: Part II

bookApplying String Methods

What can you do with the read page? It's a string, so you can utilize any string method. For instance, you can use the .find() method, which returns the index of the first occurrence of a specific element. For example, you can locate the page title by identifying the indexes of the first opening and closing tags. We'll also take into account the length of the closing tag.

1234567891011121314
# Importing the module from urllib.request import urlopen # Opening web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/mother.html" page = urlopen(url) # Reading and decoding web_page = page.read().decode("utf-8") # Indexes of opening and closing title tags start = web_page.find('<title') finish = web_page.find('</title>') + len('</title>') print(web_page[start:finish])
copy

As demonstrated in the example above, two variables, start and finish, were created. The start variable contains the index of the first element within the initial occurrence of the <title> element. Meanwhile, the finish variable holds the index of the character immediately following the closing </title> tag. The .find() method itself provided the initial index of the closing tag, so we added the length of the tag to obtain the index of the last element.

Note

List slicing excludes the last element, which is why we find the next character after the closing tag.

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 10
We're sorry to hear that something went wrong. What happened?
some-alt