From the course: Web Scraping with Python

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Challenge: Scraping news sites

Challenge: Scraping news sites - Python Tutorial

From the course: Web Scraping with Python

Start my 1-month free trial

Challenge: Scraping news sites

(lively music) - [Instructor] Although I've talked a lot in previous videos about projects with multiple spiders, we've really only seen projects with a single spider so far, either the Wikipedia spider or the IETF scraper. So in this challenge, we're going to put theory to the test and build a full Scrapy project with at least stubs of all the components working together. So you're going to want an item class, some pipeline process or processes, some settings that maybe export JSON or some other file format using the settings file. And the goal is to scrape some standard type of data from three different websites. So I'm going to scrape news articles from the Associated Press, CNN and Yahoo News. If you don't like news articles, feel free to get a little creative with this. Products, profiles. A word of warning though, don't go too crazy. You want something common, present on a lot of different websites so you can pick…

Contents