Applying String Methods

What can you do with the read page? It's a string, so you can utilize any string method. For instance, you can use the .find() method, which returns the index of the first occurrence of a specific element. For example, you can locate the page title by identifying the indexes of the first opening and closing tags. We'll also take into account the length of the closing tag.


              1234567891011121314
            
# Importing the module
from urllib.request import urlopen

# Opening web page
url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/mother.html"
page = urlopen(url)

# Reading and decoding
web_page = page.read().decode("utf-8")

# Indexes of opening and closing title tags
start = web_page.find("<title")
finish = web_page.find("</title>") + len("</title>")
print(web_page[start:finish])

As demonstrated in the example above, two variables, start and finish, were created. The start variable contains the index of the first element within the initial occurrence of the <title> element. Meanwhile, the finish variable holds the index of the character immediately following the closing </title> tag. The .find() method itself provided the initial index of the closing tag, so we added the length of the tag to obtain the index of the last element.

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 10

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

Web Scraping with Python

1. Getting Acquainted with HTML

2. Decoding HTML with Beautiful Soup

What is Beautiful Soup?Navigating HTML Document Challenge: The BeautifulSoup Object Challenge: Iterating Over Lists Working with Specific Elements Working with Paragraph Elements

3. Working with Element Attributes in Beautiful Soup

Attributes and Contents of Element Challenge: Attributes Attributes and Contents of Multiple Elements Challenge: Text from HTML Elements Advanced Search Challenge: Find All

Applying String Methods


              1234567891011121314
            
# Importing the module
from urllib.request import urlopen

# Opening web page
url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/mother.html"
page = urlopen(url)

# Reading and decoding
web_page = page.read().decode("utf-8")

# Indexes of opening and closing title tags
start = web_page.find("<title")
finish = web_page.find("</title>") + len("</title>")
print(web_page[start:finish])

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 10