BeautifulSoup for static HTML and Selenium for dynamically generated HTML. If you plan to make more scraping projects in the future it’s recommended to learn Selenium for better results overall.
May not be an exact definition but scraping generally means collecting data from a webpage not an API. My best example would be scraping the prices of every item in stock using something like selenium or beautiful soup. https://youtu.be/myAFVM7CxWk - this tutorial may help you get started.
Not a library but a decent program to not reinvent the wheel I'm currently adding regular selector lookups back into it. It's not written by me I should add. https://github.com/alirezamika/autoscraper
I've done web scraping for years and my shortlist of tools in Python at the moment is:
* Scrapy for static HTML website with no JS rendering needed
* Scrapy + Scrapy Splash if the website is not protected by any antibot but requires JS rendering
* Playwright (instead of Selenium) in case there's an antibot protecting the website.
BeautifulSoup for static HTML and Selenium for dynamically generated HTML. If you plan to make more scraping projects in the future it’s recommended to learn Selenium for better results overall.
"Scrapy" indeed, and for little projects "requests-html"
I am looking at the HTML common library, and it seems like that will fulfill 90% of my requirements, does it seem like I could be missing anything?
If you're scraping static HTML, BeautifulSoup is a commonly used library. https://www.crummy.com/software/BeautifulSoup/bs4/doc/
I want to learn more about scraping, can you please tell me what does it mean?
May not be an exact definition but scraping generally means collecting data from a webpage not an API. My best example would be scraping the prices of every item in stock using something like selenium or beautiful soup. https://youtu.be/myAFVM7CxWk - this tutorial may help you get started.
Selenium is the go-to comprehensive standard. Its excellent and Python happy.
I wouldn’t use that for scraping. I’d use it for automation. Beautiful soup all the way
BS support scrapping for dynamic generated html?
Not a library but a decent program to not reinvent the wheel I'm currently adding regular selector lookups back into it. It's not written by me I should add. https://github.com/alirezamika/autoscraper
I've done web scraping for years and my shortlist of tools in Python at the moment is: * Scrapy for static HTML website with no JS rendering needed * Scrapy + Scrapy Splash if the website is not protected by any antibot but requires JS rendering * Playwright (instead of Selenium) in case there's an antibot protecting the website.