Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre Scraping Search Engine Results Pages (SERPs) | Automating SEO Tasks with Python
Python for SEO Specialists

bookScraping Search Engine Results Pages (SERPs)

Understanding how to extract data from search engine results pages (SERPs) is a powerful skill for any SEO specialist. SERPs are the pages displayed by search engines in response to a user's query, and they contain a wealth of information: page titles, URLs, snippets, and more. Scraping SERPs allows you to gather this data in bulk, enabling you to analyze competitors, track keyword rankings, and uncover new optimization opportunities. By automating this process with Python, you can save time and gain deeper insights into the search landscape.

1234567891011121314151617181920212223242526272829
import requests from bs4 import BeautifulSoup # Hardcoded HTML snippet representing a SERP serp_html = """ <html> <body> <div class="result"> <h3><a href="https://example.com/page1">First Result Title</a></h3> <span class="snippet">This is a summary of the first result.</span> </div> <div class="result"> <h3><a href="https://example.com/page2">Second Result Title</a></h3> <span class="snippet">This is a summary of the second result.</span> </div> </body> </html> """ # Parse the HTML soup = BeautifulSoup(serp_html, "html.parser") # Extract titles and URLs for result in soup.find_all("div", class_="result"): link = result.find("a") title = link.get_text() url = link["href"] print(f"Title: {title}") print(f"URL: {url}")
copy

The scraping process involves several clear steps. First, you obtain the HTML content of the page—this can come from a file, a web request, or a hardcoded string. Next, you parse the HTML using a library that understands its structure. Once parsed, you can search for specific elements, such as the div tags with a class of "result" that contain each search result. Within each result, you look for the a tag to find the title and URL. By extracting the text from the a tag, you get the result's title, and by accessing its href attribute, you obtain the link. This method allows you to systematically collect structured data from the unstructured HTML of a SERP.

1234567891011121314151617181920212223242526272829
from bs4 import BeautifulSoup def extract_titles_and_urls(html): soup = BeautifulSoup(html, "html.parser") results = [] for result in soup.find_all("div", class_="result"): link = result.find("a") title = link.get_text() url = link["href"] results.append((title, url)) return results # Example usage: serp_html = """ <html> <body> <div class="result"> <h3><a href="https://example.com/page1">First Result Title</a></h3> <span class="snippet">This is a summary of the first result.</span> </div> <div class="result"> <h3><a href="https://example.com/page2">Second Result Title</a></h3> <span class="snippet">This is a summary of the second result.</span> </div> </body> </html> """ print(extract_titles_and_urls(serp_html))
copy

1. What Python library is commonly used for parsing HTML?

2. What information can you extract from a SERP for SEO analysis?

3. Fill in the blank: To find all 'a' tags in BeautifulSoup, use ____.

question mark

What Python library is commonly used for parsing HTML?

Select the correct answer

question mark

What information can you extract from a SERP for SEO analysis?

Select all correct answers

question-icon

Fill in the blank: To find all 'a' tags in BeautifulSoup, use ____.

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 1. Chapitre 2

Demandez à l'IA

expand

Demandez à l'IA

ChatGPT

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Suggested prompts:

Can you explain how to extract the snippet text as well?

What if the HTML structure of the SERP changes?

Can you show how to handle multiple pages of results?

bookScraping Search Engine Results Pages (SERPs)

Glissez pour afficher le menu

Understanding how to extract data from search engine results pages (SERPs) is a powerful skill for any SEO specialist. SERPs are the pages displayed by search engines in response to a user's query, and they contain a wealth of information: page titles, URLs, snippets, and more. Scraping SERPs allows you to gather this data in bulk, enabling you to analyze competitors, track keyword rankings, and uncover new optimization opportunities. By automating this process with Python, you can save time and gain deeper insights into the search landscape.

1234567891011121314151617181920212223242526272829
import requests from bs4 import BeautifulSoup # Hardcoded HTML snippet representing a SERP serp_html = """ <html> <body> <div class="result"> <h3><a href="https://example.com/page1">First Result Title</a></h3> <span class="snippet">This is a summary of the first result.</span> </div> <div class="result"> <h3><a href="https://example.com/page2">Second Result Title</a></h3> <span class="snippet">This is a summary of the second result.</span> </div> </body> </html> """ # Parse the HTML soup = BeautifulSoup(serp_html, "html.parser") # Extract titles and URLs for result in soup.find_all("div", class_="result"): link = result.find("a") title = link.get_text() url = link["href"] print(f"Title: {title}") print(f"URL: {url}")
copy

The scraping process involves several clear steps. First, you obtain the HTML content of the page—this can come from a file, a web request, or a hardcoded string. Next, you parse the HTML using a library that understands its structure. Once parsed, you can search for specific elements, such as the div tags with a class of "result" that contain each search result. Within each result, you look for the a tag to find the title and URL. By extracting the text from the a tag, you get the result's title, and by accessing its href attribute, you obtain the link. This method allows you to systematically collect structured data from the unstructured HTML of a SERP.

1234567891011121314151617181920212223242526272829
from bs4 import BeautifulSoup def extract_titles_and_urls(html): soup = BeautifulSoup(html, "html.parser") results = [] for result in soup.find_all("div", class_="result"): link = result.find("a") title = link.get_text() url = link["href"] results.append((title, url)) return results # Example usage: serp_html = """ <html> <body> <div class="result"> <h3><a href="https://example.com/page1">First Result Title</a></h3> <span class="snippet">This is a summary of the first result.</span> </div> <div class="result"> <h3><a href="https://example.com/page2">Second Result Title</a></h3> <span class="snippet">This is a summary of the second result.</span> </div> </body> </html> """ print(extract_titles_and_urls(serp_html))
copy

1. What Python library is commonly used for parsing HTML?

2. What information can you extract from a SERP for SEO analysis?

3. Fill in the blank: To find all 'a' tags in BeautifulSoup, use ____.

question mark

What Python library is commonly used for parsing HTML?

Select the correct answer

question mark

What information can you extract from a SERP for SEO analysis?

Select all correct answers

question-icon

Fill in the blank: To find all 'a' tags in BeautifulSoup, use ____.

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 1. Chapitre 2
some-alt