Using page params¶
In some cases, page object classes might require or allow parameters from the calling code, e.g. to change their behavior or make optimizations.
To support parameters, add PageParams
to your inputs:
import attrs
from web_poet import PageParams, WebPage
@attrs.define
class MyPage(WebPage):
page_params: PageParams
In your page object class, you can read parameters from a PageParams
object as you would from a dict
:
foo = self.page_params["foo"]
bar = self.page_params.get("bar", "default")
The way the calling code sets those parameters depends on your web-poet framework.
Example: Controlling item values¶
import attrs
import web_poet
from web_poet import validates_input
@attrs.define
class ProductPage(web_poet.WebPage):
page_params: web_poet.PageParams
default_tax_rate = 0.10
@validates_input
def to_item(self):
item = {
"url": self.url,
"name": self.css("#main h3.name ::text").get(),
"price": self.css("#main .price ::text").get(),
}
self.calculate_price_with_tax(item)
return item
@staticmethod
def calculate_price_with_tax(item):
tax_rate = self.page_params.get("tax_rate", self.default_tax_rate)
item["price_with_tax"] = item["price"] * (1 + tax_rate)
From the example above, we were able to provide an optional information regarding
the tax rate of the product. This could be useful when trying to support
the different tax rates for each state or territory. However, since we’re treating
the tax_rate as optional information, notice that we also have a the
default_tax_rate
as a backup value just in case it’s not available.
Example: Controlling page object behavior¶
Let’s try an example wherein PageParams
is able to control how
additional requests are being used. Specifically,
we are going to use PageParams
to control the number of pages
visited.
from typing import List
import attrs
import web_poet
from web_poet import validates_input
@attrs.define
class ProductPage(web_poet.WebPage):
http: web_poet.HttpClient
page_params: web_poet.PageParams
default_max_pages = 5
@validates_input
async def to_item(self):
return {"product_urls": await self.get_product_urls()}
async def get_product_urls(self) -> List[str]:
# Simulates scrolling to the bottom of the page to load the next
# set of items in an "Infinite Scrolling" category list page.
max_pages = self.page_params.get("max_pages", self.default_max_pages)
requests = [
self.create_next_page_request(page_num)
for page_num in range(2, max_pages + 1)
]
responses = await http.batch_execute(*requests)
return [
url
for response in responses
for product_urls in self.parse_product_urls(response)
for url in product_urls
]
@staticmethod
def create_next_page_request(page_num):
next_page_url = f"https://example.com/category/products?page={page_num}"
return web_poet.Request(url=next_page_url)
@staticmethod
def parse_product_urls(response: web_poet.HttpResponse):
return response.css("#main .products a.link ::attr(href)").getall()
From the example above, we can see how PageParams
is able to
arbitrarily limit the pagination behavior by passing an optional max_pages
info. Take note that a default_max_pages
value is also present in the page
object class in case the PageParams
instance did not provide it.