Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Store Scraped Data Into a Pandas DataFrame | Automating Data Collection from Web Sources
Automating Data Collection from Web Sources

book
Store Scraped Data Into a Pandas DataFrame

Storing scraped data in a pandas DataFrame is a convenient way to manipulate and work with the data. pandas is a powerful library in Python that provides easy-to-use data structures and data analysis tools.

A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it as a spreadsheet, SQL table, or a dictionary of Series objects. It is generally the most commonly used pandas object.

Task

Swipe to start coding

  1. Import pandas and initialize an empty DF;
  2. Scrape the country name (find all instances on the web page);
  3. Scrape the capital city (find all instances on the web page);
  4. Append the scraped values (country_name, item) in the df.

Solution

import pandas as pd

col_names = ["Country", "Capital City"]
countries = pd.DataFrame(columns = col_names)

for item in soup.find_all("div",{"class":"col-md-4 country"}):
country_name = item.find_all("h3", {"class":"country-name"})[0].text.lstrip().rstrip()
capital = item.find_all("span", {"class":"country-capital"})[0].text
countries.loc[len(countries)] = country_name, capital

countries

Mark tasks as Completed
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 5
AVAILABLE TO ULTIMATE ONLY
some-alt