Items

The to_item method of a page object class must return an item.

An item is a data container object supported by the itemadapter library, such as a dict, an attrs class, or a dataclass() class. For example:

@attrs.define
class MyItem:
    foo: int
    bar: str

Because itemadapter allows implementing support for arbitrary classes, any kind of Python object can potentially work as an item.

Defining the item class of a page object class

When inheriting from ItemPage, indicate the item class to return between brackets:

@attrs.define
class MyPage(ItemPage[MyItem]):
    ...

to_item builds an instance of the specified item class based on the page object class fields.

page = MyPage(...)
item = await page.to_item()
assert isinstance(item, MyItem)

You can also define ItemPage subclasses that are not meant to be used, only subclassed, and not annotate ItemPage in them. You can then annotate those classes when subclassing them:

@attrs.define
class MyBasePage(ItemPage):
    ...

@attrs.define
class MyPage(MyBasePage[MyItem]):
    ...

To change the item class of a subclass that has already defined its item class, use Returns:

@attrs.define
class MyOtherPage(MyPage, Returns[MyOtherItem]):
    ...

Best practices for item classes

To keep your code maintainable, we recommend you to:

  • Instead of dict, use proper item classes based on dataclasses or attrs, to make it easier to detect issues like field name typos or missing required fields.

  • Reuse item classes.

    For example, if you want to extract product details data from 2 e-commerce websites, try to use the same item class for both of them. Or at least try to define a base item class with shared fields, and only keep website-specific fields in website-specific items.

  • Keep item classes as logic-free as possible.

    For example, any parsing and field cleanup logic is better handled through page object classes, e.g. using field processors.

    Having code that makes item field values different from their counterpart page object field values can subvert the expectations of users of your code, which might need to access page object fields directly, for example for field subset selection.

If you are looking for ready-made item classes, check out zyte-common-items.