Course Content
Web Scraping with Python
Now you are familiar with the main aspects of HTML
. Let's learn the first way to work with it in Python
.
One of the modules that you can use to handle HTML
files in Python
is urllib.request
. We must import the urlopen
method for opening web pages. Pass the URL page you want to open as the method's parameter.
As you can see above, you received http.client.HTTPResponse object
as a result. Differs from what we wanted. You should apply the .read()
and .decode("utf-8")
methods to the object you got to get the HTML
structure.
As you can see, after applying the .read()
and .decode()
methods you got the string as the result. This string stores an HTML
structure in a pretty format so that it can be easily read and you can apply string methods to it.
If the .decode()
method wasn't been applied, then you would receive the bytes
object with all the HTML
page been represented in a single string with specific characters. Feel free to try!
Section 1.
Chapter 8