web-poet

Warning

web-poet is in early stages of development; backwards incompatible changes are possible.

web-poet implements Page Object pattern for web scraping. It defines a standard for writing web data extraction code, which allows the code to be portable & reusable.

The main idea is to separate the extraction logic from all other concerns. web-poet Page Objects don’t do I/O, and they’re not dependent on any particular framework like Scrapy.

This allows the code written using web-poet to be testable and reusable. For example, one can write a web-poet Page Object in an IPython notebook, plug it into a Scrapy spider, write tests for them using unittest or pytest, and then reuse in a simple script which uses requests library.

To install it, run pip install web-poet. It requires Python 3.6+. License is BSD 3-clause.

If you want to quickly learn how to write web-poet Page Objects, see web-poet on a surface. To understand better all the web-poet concepts and the motivation behind web-poet, start with web-poet from the ground up.