-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Playwright content fetcher
dgtlmoon edited this page Jun 11, 2022
·
23 revisions
You can fetch pages using the excellent and very fast Playwright backend https://docs.browserless.io/docs/docker-quickstart.html
See docker-compose.yml for more examples
Set the environment variable PLAYWRIGHT_DRIVER_URL
to ws://127.0.0.1:3000
In docker-compose.yml uncomment these lines
environment:
- PLAYWRIGHT_DRIVER_URL=ws://playwright-chrome:3000/
playwright-chrome:
hostname: playwright-chrome
image: browserless/chrome
restart: unless-stopped
docker run -d --name browserless \
-e "DEFAULT_LAUNCH_ARGS=[\"--window-size=1920,1080\"]" \
--rm -p 3000:3000 \
--shm-size="2g" \
browserless/chrome:1.53-chrome-stable
@todo
There seems to be some memory leak in playwright https://github.com/microsoft/playwright/issues/6319 , as yet there does not seem to be a solution, this can easily consume 200Mb->several gigabytes, restarting the service seems to be very fast and so far the best way to mitigate this
Crontab every x minutes..
#!/bin/bash
# the docker container should restart this
# Check if >240Mb
ps -C 'python ./changedetection.py -d /datastore' u|grep -v PID|awk '$6 > 240000 {print $2};'|while read pid
do
kill -9 $pid
done