getting started with scrapy in python

Post on 13-May-2015

1.266 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Web Scraping with ScrapyVirendra Rajput

Hacker @Markitty

Agenda

● What is web scraping and why it's fun● My experiments with web scraping● Getting started with Scrapy● How Scrapy works and a quick Demo ● Why Scrapy● Questions

What is Web Scraping?

● Extracting information from websites● Problem:

○ Static websites ○ No access to APIs to extract the data you

need○ Need to extract data periodically

● Manual solution - go to the website and copy the required data

● Smarter solution: Web Scraping

My Experiments with Scraping

Web Scraping in Python

● Download webpage with urllib2, requests

● Parse the page with BeautifulSoup/lxml

● Select with XPath or css selectors

Scrapy - fast high Level Screen Scraping and web crawling Framework● Pick a website● Define the data you want to scrape● Write the spider to extract the data● Run the spider ● Store the Data

Demo

Why Scrapy

● Simplicity● Fast● Productive/ Extensible● Portable● Well docs & Healthy community● Commercial Support

Advanced Features (built in)

● Interactive shell for trying XPaths (useful for debugging)

● selecting and extracting data from html sources

● cleaning and sanitizing the scraped data● generating feed exports (JSON, CSV)● media pipeline for downloading stuff● Middlewares for (cookies, HTTP

compression, cache, user-agent spoofing, etc)

questions?

top related