From the course: Web Scraping with Python

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Structuring your scrapers for extensibility/reusability

Structuring your scrapers for extensibility/reusability - Python Tutorial

From the course: Web Scraping with Python

Start my 1-month free trial

Structuring your scrapers for extensibility/reusability

- [Instructor] So far, we've seen a few of scrapy's moving parts and how they come together to crawl websites. You have the scrapy settings which manages configure options for the entire project or for an individual spider. You have the spiders which encompass all of the websites specific volatility and are custom built to turn a messy website into a consistent type of item. Then you have the scrapy items, the item objects themselves, these items, at least in theory don't require a website-specific knowledge to deal with anymore because, all that was handled by the spider, their data isn't necessarily in the most clean or finalized state. We could write a bunch of cleaning code in the spider, of course. And by the time the data gets to the item object let's say it's absolutely pristine, but then you might end up duplicating a lot of cleaning code in each spider, if you have multiple spiders for multiple websites.…

Contents