The library pandas provide a quick and convenient solution for converting HTML tables to the `DataFrame`. The function read_html() can be useful for scraping tables from various websites without figuring out how to get the website’s HTML. You can use `read_html()` to work with tables whose structure is not complicated, for example, tables on Wikipedia pages.

import pandas as pd
tables = pd.read_html('https://en.wikipedia.org/wiki/Florida')

In the code above, the function `read_html()` got all tables from Wiki about [Florida](https://en.wikipedia.org/wiki/Florida). `table` is a list of all the tables on the page already converted to `DataFrames`.

With a large number of tables on the page, it can be challenging to find the one you need. To make the table selection easier, use the `match` parameter to select the table you want. For example:

import pandas as pd
tables = pd.read_html('https://en.wikipedia.org/wiki/Florida', match='State University System of Florida')

Web Scraping is a process that can be used to automatically extract information from websites. This course will help you to get the data from sources and transform it to the DataFrames using various libraries!

Here we will explore the structure of the HTML file, how to load it and work with the data in the file.

Learn how to extract the data more comfortably with the BeautifulSoup library.

Explore how XPath and CSS Selectors can be used to locate web elements without id, class, or name.

Here we will learn how to convert tables into the DataFrame for the following analysis.

Simple Solution for Scraping

Lösung