How to retrieve links from web page using Python and BeautifulSoup?

Sometimes, we want to retrieve links from web page using Python and BeautifulSoup.

In this article, we’ll look at how to retrieve links from web page using Python and BeautifulSoup.

How to retrieve links from web page using Python and BeautifulSoup?

To retrieve links from web page using Python and BeautifulSoup, we can use the SoupStrainer class.

For instance, we write

import httplib2
from bs4 import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.example.com')

for link in BeautifulSoup(response, parse_only=SoupStrainer('a')):
    if link.has_attr('href'):
        print(link['href'])

to make a GET request to example.com with

http = httplib2.Http()
status, response = http.request('http://www.example.com')

Then we parse the response by passing it into BeautifulSoup.

And we get the anchor elements by setting the parse_only argument to SoupStrainer('a').

In the loop, we loop through all the links and get the href attribute of each link with attr.

Post Views: 30