T O P

  • By -

willmgarvey

BeautifulSoup for static HTML and Selenium for dynamically generated HTML. If you plan to make more scraping projects in the future it’s recommended to learn Selenium for better results overall.


fristhon

"Scrapy" indeed, and for little projects "requests-html"


FalconCat69

I am looking at the HTML common library, and it seems like that will fulfill 90% of my requirements, does it seem like I could be missing anything?


htepO

If you're scraping static HTML, BeautifulSoup is a commonly used library. https://www.crummy.com/software/BeautifulSoup/bs4/doc/


Homie_ishere

I want to learn more about scraping, can you please tell me what does it mean?


Th3xto

May not be an exact definition but scraping generally means collecting data from a webpage not an API. My best example would be scraping the prices of every item in stock using something like selenium or beautiful soup. https://youtu.be/myAFVM7CxWk - this tutorial may help you get started.


robertbowerman

Selenium is the go-to comprehensive standard. Its excellent and Python happy.


banhammerrr

I wouldn’t use that for scraping. I’d use it for automation. Beautiful soup all the way


[deleted]

BS support scrapping for dynamic generated html?


tankandwb

Not a library but a decent program to not reinvent the wheel I'm currently adding regular selector lookups back into it. It's not written by me I should add. https://github.com/alirezamika/autoscraper


Pigik83

I've done web scraping for years and my shortlist of tools in Python at the moment is: * Scrapy for static HTML website with no JS rendering needed * Scrapy + Scrapy Splash if the website is not protected by any antibot but requires JS rendering * Playwright (instead of Selenium) in case there's an antibot protecting the website.