Items¶
The to_item
method of a page object class must
return an item.
An item is a data container object supported by the itemadapter library, such
as a dict
, an attrs class, or a dataclass()
class. For example:
@attrs.define
class MyItem:
foo: int
bar: str
Because itemadapter allows implementing support for arbitrary classes, any kind of Python object can potentially work as an item.
Best practices for item classes¶
To keep your code maintainable, we recommend you to:
Reuse item classes.
For example, if you want to extract product details data from 2 e-commerce websites, try to use the same item class for both of them. Or at least try to define a base item class with shared fields, and only keep website-specific fields in website-specific items.
Keep item classes as logic-free as possible.
For example, any parsing and field cleanup logic is better handled through page object classes, e.g. using field processors.
Having code that makes item field values different from their counterpart page object field values can subvert the expectations of users of your code, which might need to access page object fields directly, for example for field subset selection.
If you are looking for ready-made item classes, check out zyte-common-items.