For k8s, you can run multiple Reader instances behind a load balancer, each manages its own browser pool. Main things to watch:
- Memory limits (~500MB-1GB per concurrent browser)
- Headless Chrome needs --no-sandbox or a seccomp profile
- Sticky sessions for crawl jobs (or run full crawl on single pod)
The main challenge is distributed rate-limiting, something I'd hope the framework handles for me. Also having k8s settings that work well in your experience w.r.t. scaling
Distributed rate-limiting is intentionally not in the core library, Reader focuses on the scraping primitives and stays unopinionated about orchestration.
For multi-node rate limiting, you'd layer that on top: Redis + a simple limiter that gates calls to reader.scrape().
You're missing the point, I don't want to build it myself. The framework I will actually use will do it for me. If yours does not, it will not be in consideration.
Killer domain btw, how did you nab that one?
Any docs on how to run this on multiple machines? (ideally k8s)
Thanks! Honestly that was just pure luck with the domain :)
There's a Docker deployment guide here: https://docs.reader.dev/documentation/guides/deployment
For k8s, you can run multiple Reader instances behind a load balancer, each manages its own browser pool. Main things to watch:
- Memory limits (~500MB-1GB per concurrent browser) - Headless Chrome needs --no-sandbox or a seccomp profile - Sticky sessions for crawl jobs (or run full crawl on single pod)
A dedicated k8s guide is on the roadmap...
The main challenge is distributed rate-limiting, something I'd hope the framework handles for me. Also having k8s settings that work well in your experience w.r.t. scaling
Distributed rate-limiting is intentionally not in the core library, Reader focuses on the scraping primitives and stays unopinionated about orchestration.
For multi-node rate limiting, you'd layer that on top: Redis + a simple limiter that gates calls to reader.scrape().
For k8s resource settings, the Docker guide is a good starting point: https://docs.reader.dev/documentation/guides/deployment
But I will add some reference examples on how to build a rate-limiting and K8s orchestration layer on top of Reader...
Thanks for sharing this :)
There's plenty of what you've built here to go around. It's trivial now to reproduce the basics.
Distributed rate-limiting is a hard problem, one people may pay for
I will add some reference examples on how to build a rate-limiting and K8s orchestration layer on top of Reader soon :)
You're missing the point, I don't want to build it myself. The framework I will actually use will do it for me. If yours does not, it will not be in consideration.