course content

Course Content

Web Scraping with Python

Opening HTML FileOpening HTML File

Now you are familiar with the main aspects of HTML. Let's learn the first way to work with it in Python.

One of the modules that you can use to handle HTML files in Python is urllib.request. We must import the urlopen method for opening web pages. Pass the URL page you want to open as the method's parameter.

As you can see above, you received http.client.HTTPResponse object as a result. Differs from what we wanted. You should apply the .read() and .decode("utf-8") methods to the object you got to get the HTML structure.

As you can see, after applying the .read() and .decode() methods you got the string as the result. This string stores an HTML structure in a pretty format so that it can be easily read and you can apply string methods to it.

If the .decode() method wasn't been applied, then you would receive the bytes object with all the HTML page been represented in a single string with specific characters. Feel free to try!

Section 1.

Chapter 8