Menu Close

How to parse HTML using Python?

Sometimes, we want to parse HTML using Python.

In this article, we’ll look at how to parse HTML using Python.

How to parse HTML using Python?

To parse HTML using Python, we can use Beautiful Soup.

For instance, we write

try: 
    from BeautifulSoup import BeautifulSoup
except ImportError:
    from bs4 import BeautifulSoup

# ...

parsed_html = BeautifulSoup(html)
print(parsed_html.body.find('div', attrs={'class':'container'}).text)

to create a BeautifulSoup object with the html HTML string to parse it into an object.

Then we call parsed_html.body.find with 'div‘ and the attr dict to find the div with the container class.

And we return its text content with text.

Conclusion

To parse HTML using Python, we can use Beautiful Soup.

Posted in Python, Python Answers