Contribute Media
A thank you to everyone who makes this possible: Read More

Frontera: open source large-scale web crawling framework

Translations: en

Description

Alexander Sibiryakov - Frontera: open source large-scale web crawling framework [EuroPython 2015] [20 July 2015] [Bilbao, Euskadi, Spain]

In this talk I'm going to introduce Scrapinghub's new open source framework [Frontera][1]. Frontera allows to build real-time distributed web crawlers and website focused ones.

Offering:

  • customizable URL metadata storage (RDBMS or Key-Value based),
  • crawling strategies management,
  • transport layer abstraction.
  • fetcher abstraction.

Along with framework description I'll demonstrate how to build a distributed crawler using [Scrapy][2], Kafka and HBase, and hopefully present some statistics of Spanish internet collected with newly built crawler. Happy EuroPythoning!

[1]: https://github.com/scrapinghub/frontera [2]: http://scrapy.org/

Improve this page
OSZAR »