From the course: Web Scraping with Python
Unlock the full course today
Join today to access over 22,500 courses taught by industry experts or purchase this course individually.
Structuring your scrapers for extensibility/reusability - Python Tutorial
From the course: Web Scraping with Python
Structuring your scrapers for extensibility/reusability
- [Instructor] So far, we've seen a few of scrapy's moving parts and how they come together to crawl websites. You have the scrapy settings which manages configure options for the entire project or for an individual spider. You have the spiders which encompass all of the websites specific volatility and are custom built to turn a messy website into a consistent type of item. Then you have the scrapy items, the item objects themselves, these items, at least in theory don't require a website-specific knowledge to deal with anymore because, all that was handled by the spider, their data isn't necessarily in the most clean or finalized state. We could write a bunch of cleaning code in the spider, of course. And by the time the data gets to the item object let's say it's absolutely pristine, but then you might end up duplicating a lot of cleaning code in each spider, if you have multiple spiders for multiple websites.…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.