Inputs

Page object classes, in their __init__ method, must define input parameters with type hints pointing to input classes.

Those input classes may be:

Based on the target URL and parameter type hints, frameworks automatically build the required objects at run time, and pass them to the __init__ method of the corresponding page object class.

For example, if a page object class has an __init__ parameter of type HttpResponse, and the target URL is https://example.com, your framework would send an HTTP request to https://example.com, download the response, build an HttpResponse object with the response data, and pass it to the __init__ method of the page object class being used.

Built-in input classes

Warning

Not all frameworks support all web-poet built-in input classes.

The web_poet.page_inputs module defines multiple classes that you can define as inputs for a page object class, including:

  • HttpResponse, a complete HTTP response, including URL, headers, and body. This is the most common input for a page object class.

  • HttpClient, to send additional requests.

  • RequestUrl, the target URL before following redirects. Useful, for example, to skip the target URL download, and instead use HttpClient to send a custom request based on parts of the target URL.

  • PageParams, to receive data from the crawling code.

  • Stats, to write key-value data pairs during parsing that you can inspect later, e.g. for debugging purposes.

  • BrowserResponse, which includes URL, status code and BrowserHtml of a rendered web page.

Custom input classes

You may define your own input classes if you are using a framework that supports it.

However, note that custom input classes may make your page object classes less portable across frameworks.