Items
The to_item
method of a page object class must
return an item.
An item is a data container object supported by the itemadapter library, such
as a dict
, an attrs class, or a dataclass()
class. For example:
@attrs.define
class MyItem:
foo: int
bar: str
Because itemadapter allows implementing support for arbitrary classes, any kind of Python object can potentially work as an item.
Defining the item class of a page object class
When inheriting from ItemPage
, indicate the item class to return
between brackets:
@attrs.define
class MyPage(ItemPage[MyItem]):
...
to_item
builds an instance of the specified item class
based on the page object class fields.
page = MyPage(...)
item = await page.to_item()
assert isinstance(item, MyItem)
You can also define ItemPage
subclasses that are not meant to be
used, only subclassed, and not annotate ItemPage
in them. You can
then annotate those classes when subclassing them:
@attrs.define
class MyBasePage(ItemPage):
...
@attrs.define
class MyPage(MyBasePage[MyItem]):
...
To change the item class of a subclass that has already defined its item class,
use Returns
:
@attrs.define
class MyOtherPage(MyPage, Returns[MyOtherItem]):
...
Best practices for item classes
To keep your code maintainable, we recommend you to:
Instead of
dict
, use proper item classes based ondataclasses
or attrs, to make it easier to detect issues like field name typos or missing required fields.Reuse item classes.
For example, if you want to extract product details data from 2 e-commerce websites, try to use the same item class for both of them. Or at least try to define a base item class with shared fields, and only keep website-specific fields in website-specific items.
Keep item classes as logic-free as possible.
For example, any parsing and field cleanup logic is better handled through page object classes, e.g. using field processors.
Having code that makes item field values different from their counterpart page object field values can subvert the expectations of users of your code, which might need to access page object fields directly, for example for field subset selection.
If you are looking for ready-made item classes, check out zyte-common-items.