Changelog
0.18.0 (2025-01-30)
Removed support for Python 3.8, added support for Python 3.13.
The minimum required version of url-matcher changed from
0.2.0
to0.4.0
.type(None)
is no longer considered injectable.
0.17.1 (2024-10-11)
web_poet.mixins.SelectableMixin.selector
is now created with thebase_url
value set toself.url
if this attribute exists.Added a mention of the form2request library to the
HttpRequest
documentation.CI improvements.
0.17.0 (2024-03-04)
Now requires
andi >= 0.5.0
.Package requirements that were unversioned now have minimum versions specified.
Added support for Python 3.12.
Added support for
typing.Annotated
dependencies to the serialization and testing code.Documentation improvements.
CI improvements.
0.16.0 (2024-01-23)
Added new
AnyResponse
which holds eitherBrowserResponse
, orHttpResponse
.Documentation improvements.
0.15.1 (2023-11-21)
HttpRequestHeaders
now has afrom_bytes_dict
class method, likeHttpResponseHeaders
.
0.15.0 (2023-09-11)
0.14.0 (2023-08-03)
Dropped Python 3.7 support.
Now requires
packaging >= 20.0
.Fixed detection of the
Returns
base class.Improved docs.
Updated type hints.
Updated CI tools.
0.13.1 (2023-05-30)
Fixed an issue with
HttpClient
which happens when a response with a non-standard status code is received.
0.13.0 (2023-05-30)
A new dependency
BrowserResponse
has been added. It contains a browser-rendered page URL, status code and HTML.The Rules documentation section has been rewritten.
0.12.0 (2023-05-05)
The testing framework now allows defining a custom item adapter.
We have made a backward-incompatible change on test fixture serialization: the
type_name
field of exceptions has been renamed toimport_path
.Fixed built-in Python types, e.g.
int
, not working as field processors.
0.11.0 (2023-04-24)
JMESPath support is now available: you can use
WebPage.jmespath()
andHttpResponse.jmespath()
to run queries on JSON responses.The testing framework now supports page objects that raise exceptions from the
to_item
method.
0.10.0 (2023-04-19)
New class
Extractor
can be used for easier extraction of nested fields (see Processors for nested fields).Exceptions raised while getting a response for an additional request are now saved in test fixtures.
Multiple documentation improvements and fixes.
Add a
twine check
CI check.
0.9.0 (2023-03-30)
Standardized input validation.
Field processors can now also be defined through a nested
Processors
class, so that field redefinitions in subclasses can inherit them. See Default processors.Field processors can now opt in to receive the page object whose field is being read.
web_poet.fields.FieldsMixin
now keeps fields from all base classes when using multiple inheritance.Fixed the documentation build.
0.8.1 (2023-03-03)
Fix the error when calling
.to_item()
,item_from_fields_sync()
, oritem_from_fields()
on page objects defined as slotted attrs classes, while settingskip_nonitem_fields=True
.
0.8.0 (2023-02-23)
This release contains many improvements to the web-poet testing framework, as well as some other improvements and bug fixes.
Backward-incompatible changes:
cached_method()
no longer caches exceptions forasync def
methods. This makes the behavior the same for sync and async methods, and also makes it consistent with Python’s stdlib caching (i.e.functools.lru_cache()
,functools.cached_property()
).The testing framework now uses the
HttpResponse-info.json
file name instead ofHttpResponse-other.json
to store information about HttpResponse instances. To make tests generated with older web-poet work, rename these files on disk.
Testing framework improvements:
Improved test reporting: better diffs and error messages.
By default, the pytest plugin now generates a test per item attribute (see Running tests). There is also an option (
--web-poet-test-per-item
) to run a test per item instead.Page objects with the
HttpClient
dependency are now supported (see Additional requests support).Page objects with the
PageParams
dependency are now supported.Added a new
python -m web_poet.testing rerun
command (see Test-Driven Development).Fixed support for nested (indirect) dependencies in page objects. Previously they were not handled properly by the testing framework.
Non-ASCII output is now stored without escaping in the test fixtures, for better readability.
Other changes:
Testing and CI fixes.
Fixed a packaging issue:
tests
andtests_extra
packages were installed, not justweb_poet
.
0.7.2 (2023-02-01)
Restore the minimum version of
itemadapter
from 0.7.1 to 0.7.0, and prevent a similar issue from happening again in the future.
0.7.1 (2023-02-01)
Updated the tutorial to cover recent features and focus on best practices. Also, a new module was added,
web_poet.example
, that allows using page objects while following the tutorial.Tests for page objects now covers Git LFS and scrapy-poet, and recommends
python -m pytest
instead ofpytest
.Improved the warning message when duplicate
ApplyRule
objects are found.HttpResponse-other.json
content is now indented for better readability.Improved test coverage for fields.
0.7.0 (2023-01-18)
Add a framework for creating tests and running them with pytest.
Support implementing fields in mixin classes.
Introduce new methods for
web_poet.rules.RulesRegistry
:Improved the performance of
web_poet.rules.RulesRegistry.search()
where passing a single parameter of eitherinstead_of
orto_return
results in O(1) look-up time instead of O(N). Additionally, having eitherinstead_of
orto_return
present in multi-parameter search calls would filter the initial candidate results resulting in a faster search.Support page object dependency serialization.
Add new dependencies used in testing and serialization code:
andi
,python-dateutil
, andtime-machine
. Alsobackports.zoneinfo
on non-Windows platforms when the Python version is older than 3.9.
0.6.0 (2022-11-08)
In this release, the @handle_urls
decorator gets an overhaul; it’s not
required anymore to pass another Page Object class to
@handle_urls("...", overrides=...)
.
Also, the @web_poet.field
decorator gets support for output processing
functions, via the out
argument.
Full list of changes:
Backwards incompatible
PageObjectRegistry
is no longer supporting dict-like access.Official support for Python 3.11.
New
@web_poet.field(out=[...])
argument which allows to set output processing functions for web-poet fields.The
web_poet.overrides
module is deprecated and replaced withweb_poet.rules
.The
@handle_urls
decorator is now creatingApplyRule
instances instead ofOverrideRule
instances;OverrideRule
is deprecated.ApplyRule
is similar toOverrideRule
, but has the following differences:ApplyRule
accepts ato_return
parameter, which should be the data container (item) class that the Page Object returns.Passing a string to
for_patterns
would auto-convert it intourl_matcher.Patterns
.All arguments are now keyword-only except for
for_patterns
.
New signature and behavior of
handle_urls
:The
overrides
parameter is made optional and renamed toinstead_of
.If defined, the item class declared in a subclass of
web_poet.ItemPage
is used as theto_return
parameter ofApplyRule
.Multiple
handle_urls
annotations are allowed.
PageObjectRegistry
is replaced withRulesRegistry
; its API is changed:backwards incompatible dict-like API is removed;
backwards incompatible O(1) lookups using
.search(use=PagObject)
has become O(N);search_overrides
method is renamed tosearch
;get_overrides
method is renamed toget_rules
;from_override_rules
method is deprecated; useRulesRegistry(rules=...)
instead.
Typing improvements.
Documentation, test, and warning message improvements.
Deprecations:
The
web_poet.overrides
module is deprecated. Useweb_poet.rules
instead.The
overrides
parameter from@handle_urls
is now deprecated. Use theinstead_of
parameter instead.The
OverrideRule
class is now deprecated. UseApplyRule
instead.PageObjectRegistry
is now deprecated. UseRulesRegistry
instead.The
from_override_rules
method ofPageObjectRegistry
is now deprecated. UseRulesRegistry(rules=...)
instead.The
PageObjectRegistry.get_overrides
method is deprecated. UsePageObjectRegistry.get_rules
instead.The
PageObjectRegistry.search_overrides
method is deprecated. UsePageObjectRegistry.search
instead.
0.5.1 (2022-09-23)
The BOM encoding from the response body is now read before the response headers when deriving the response encoding.
Minor typing improvements.
0.5.0 (2022-09-21)
Web-poet now includes a mini-framework for organizing extraction code as Page Object properties:
import attrs
from web_poet import field, ItemPage
@attrs.define
class MyItem:
foo: str
bar: list[str]
class MyPage(ItemPage[MyItem]):
@field
def foo(self):
return "..."
@field
def bar(self):
return ["...", "..."]
Backwards incompatible changes:
web_poet.ItemPage
is no longer an abstract base class which requiresto_item
method to be implemented. Instead, it provides a defaultasync def to_item
method implementation which uses fields marked asweb_poet.field
to create an item. This change shouldn’t affect the user code in a backwards incompatible way, but it might affect typing.
Deprecations:
web_poet.ItemWebPage
is deprecated. Useweb_poet.WebPage
instead.
Other changes:
web-poet is declared as PEP 561 package which provides typing information; mypy is going to use it by default.
Documentation, test, typing and CI improvements.
0.4.0 (2022-07-26)
New
HttpResponse.urljoin
method, which take page’s base url in account.New
HttpRequest.urljoin
method.standardized
web_poet.exceptions.Retry
exception, which allows to initiate a retry from the Page Object, e.g. based on page content.Documentation improvements.
0.3.0 (2022-06-14)
Backwards Incompatible Change:
web_poet.requests.request_backend_var
is renamed toweb_poet.requests.request_downloader_var
.
Documentation and CI improvements.
0.2.0 (2022-06-10)
Backward Incompatible Change:
ResponseData
is replaced withHttpResponse
.HttpResponse
exposes methods useful for web scraping (such as xpath and css selectors, json loading), and handles web page encoding detection. There are also new types likeHttpResponseBody
andHttpResponseHeaders
.
Added support for performing additional requests using
web_poet.HttpClient
.Introduced
web_poet.BrowserHtml
dependencyIntroduced
web_poet.PageParams
to pass arbitrary information inside a Page Object.Added
web_poet.handle_urls
decorator, which allows to declare which websites should be handled by the page objects. Lower-levelPageObjectRegistry
class is also available.removed support for Python 3.6
added support for Python 3.10
0.1.1 (2021-06-02)
base_url
andurljoin
shortcuts
0.1.0 (2020-07-18)
Documentation
WebPage, ItemPage, ItemWebPage, Injectable and ResponseData are available as top-level imports (e.g.
web_poet.ItemPage
)
0.0.1 (2020-04-27)
Initial release.