Skip to content

A plugin for Scrapy that allows users to capture and export web archives in the WARC and WACZ formats during crawling.

Notifications You must be signed in to change notification settings

q-m/scrapy-webarchive

Repository files navigation

Scrapy Webarchive

Docs

Scrapy Webarchive is a plugin for Scrapy that allows users to capture and export web archives in the WARC and WACZ formats during crawling.

Features

  • Save web crawls in WACZ format (multiple storages supported; local and cloud).
  • Crawl against WACZ format archives.
  • Integrate seamlessly with Scrapy’s spider request and response cycle.

Compatibility

  • Python 3.7, 3.8, 3.9, 3.10, 3.11 and 3.12

Documentation

Documentation is available online at developers.thequestionmark.org/scrapy-webarchive/