Load More¶
This example demonstrates how to scrape a website that uses a “load more” button to load additional content.
The page that we will scrape is DataServiceTestPage - Load More.
The setup is similar to the previous example, but with a different actions coroutine function.
client = PlaywrightClient(actions=press_button, intercept_url="posts")
The actions coroutine function press_button is defined as follows:
async def press_button(page: PlaywrightPage):
has_posts = True
while has_posts:
await page.get_by_role("button").click()
await page.wait_for_timeout(1000)
no_more_posts = page.get_by_text("No more posts")
if await no_more_posts.is_visible():
has_posts = False
The press_button function will click the “Load More” button until the “No more posts” message is displayed.
Finally, the parse callback is simply iterating over the response data and yielding the items.
Full code for the load more button example:
from logging import getLogger
from pprint import pprint
from dataservice import (
DataService,
PlaywrightClient,
PlaywrightPage,
Request,
Response,
setup_logging,
)
logger = getLogger("interceptor_button")
setup_logging("interceptor_button")
async def press_button(page: PlaywrightPage):
has_posts = True
while has_posts:
await page.get_by_role("button").click()
await page.wait_for_timeout(1000)
no_more_posts = page.get_by_text("No more posts")
if await no_more_posts.is_visible():
has_posts = False
def parse_intercepted(response: Response):
for url in response.data:
for item in response.data[url]:
yield {"url": url, **item}
def main():
client = PlaywrightClient(actions=press_button, intercept_url="posts")
start_requests = [
Request(
url="https://lucaromagnoli.github.io/ds-mock-spa/#/load-more",
callback=parse_intercepted,
client=client,
)
]
service = DataService(start_requests)
data = tuple(service)
pprint(data)
if __name__ == "__main__":
main()