Framework specification
Learn how to build a web-poet framework.
Design principles
Page objects should be flexible enough to be used with:
synchronous or asynchronous code, callback-based and
async def / await
based,single-node and distributed systems,
different underlying HTTP implementations - or without HTTP support at all, etc.
Minimum requirements
A web-poet framework must support building a page object given a page object class.
It must be able to build input objects for a page object based on type hints on the page object class, i.e. dependency injection, and additional input data required by those input objects, such as a target URL or a dictionary of page parameters.
You can implement dependency injection with the andi library, which handles
signature inspection, Optional
and Union
annotations, as well as indirect dependencies. For practical examples, see the
source code of scrapy-poet and of the web_poet.example
module.
Additional features
To provide a better experience to your users, consider extending your web-poet framework further to:
Support as many input classes from the
web_poet.page_inputs
module as possible.Support returning a page object given a target URL and a desired output item class, determining the right page object class to use based on rules.
Allow users to request an output item directly, instead of requesting a page object just to call its
to_item
method.If you do, consider supporting both synchronous and asynchronous definitions of the
to_item
method, e.g. usingensure_awaitable()
.Support additional requests.
Support retries.
Let users set their own rules, e.g. to solve conflicts.