Sometimes, we want to parse HTML using Python.
In this article, we’ll look at how to parse HTML using Python.
How to parse HTML using Python?
To parse HTML using Python, we can use Beautiful Soup.
For instance, we write
try:
from BeautifulSoup import BeautifulSoup
except ImportError:
from bs4 import BeautifulSoup
# ...
parsed_html = BeautifulSoup(html)
print(parsed_html.body.find('div', attrs={'class':'container'}).text)
to create a BeautifulSoup object with the html HTML string to parse it into an object.
Then we call parsed_html.body.find with 'div‘ and the attr dict to find the div with the container class.
And we return its text content with text.
Conclusion
To parse HTML using Python, we can use Beautiful Soup.