web-poet¶
Warning
web-poet is in early stages of development; backwards incompatible changes are possible.
web-poet
implements Page Object pattern for web scraping.
It defines a standard for writing web data extraction code, which allows
the code to be portable & reusable.
The main idea is to separate the extraction logic from all other concerns.
web-poet
Page Objects don’t do I/O,
and they’re not dependent on any particular framework like Scrapy.
This allows the code written using web-poet
to be testable and reusable.
For example, one can write a web-poet Page Object in an IPython notebook,
plug it into a Scrapy spider, write tests for them using unittest or pytest,
and then reuse in a simple script which uses requests
library.
To install it, run pip install web-poet
. It requires Python 3.7+.
License is BSD 3-clause.
If you want to quickly learn how to write web-poet Page Objects,
see web-poet on a surface. To understand better all the web-poet
concepts
and the motivation behind web-poet
, start with web-poet from the ground up.
Getting started
Reference