Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer Applying String Methods | Getting Acquainted with HTML
Web Scraping with Python

bookApplying String Methods

What can you do with the read page? It's a string, so you can utilize any string method. For instance, you can use the .find() method, which returns the index of the first occurrence of a specific element. For example, you can locate the page title by identifying the indexes of the first opening and closing tags. We'll also take into account the length of the closing tag.

1234567891011121314
# Importing the module from urllib.request import urlopen # Opening web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/mother.html" page = urlopen(url) # Reading and decoding web_page = page.read().decode("utf-8") # Indexes of opening and closing title tags start = web_page.find("<title") finish = web_page.find("</title>") + len("</title>") print(web_page[start:finish])
copy

As demonstrated in the example above, two variables, start and finish, were created. The start variable contains the index of the first element within the initial occurrence of the <title> element. Meanwhile, the finish variable holds the index of the character immediately following the closing </title> tag. The .find() method itself provided the initial index of the closing tag, so we added the length of the tag to obtain the index of the last element.

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 10

Vraag AI

expand

Vraag AI

ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Awesome!

Completion rate improved to 4.35

bookApplying String Methods

Veeg om het menu te tonen

What can you do with the read page? It's a string, so you can utilize any string method. For instance, you can use the .find() method, which returns the index of the first occurrence of a specific element. For example, you can locate the page title by identifying the indexes of the first opening and closing tags. We'll also take into account the length of the closing tag.

1234567891011121314
# Importing the module from urllib.request import urlopen # Opening web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/mother.html" page = urlopen(url) # Reading and decoding web_page = page.read().decode("utf-8") # Indexes of opening and closing title tags start = web_page.find("<title") finish = web_page.find("</title>") + len("</title>") print(web_page[start:finish])
copy

As demonstrated in the example above, two variables, start and finish, were created. The start variable contains the index of the first element within the initial occurrence of the <title> element. Meanwhile, the finish variable holds the index of the character immediately following the closing </title> tag. The .find() method itself provided the initial index of the closing tag, so we added the length of the tag to obtain the index of the last element.

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 10
some-alt