Astro - Hacker News

7 comments

verdverm 10 hours ago ago

Killer domain btw, how did you nab that one?
Any docs on how to run this on multiple machines? (ideally k8s)
[-]
- nihalwashere 10 hours ago ago
  
  Thanks! Honestly that was just pure luck with the domain :)
  There's a Docker deployment guide here: https://docs.reader.dev/documentation/guides/deployment
  For k8s, you can run multiple Reader instances behind a load balancer, each manages its own browser pool. Main things to watch:
  - Memory limits (~500MB-1GB per concurrent browser) - Headless Chrome needs --no-sandbox or a seccomp profile - Sticky sessions for crawl jobs (or run full crawl on single pod)
  A dedicated k8s guide is on the roadmap...
  [-]
  - verdverm 10 hours ago ago
    
    The main challenge is distributed rate-limiting, something I'd hope the framework handles for me. Also having k8s settings that work well in your experience w.r.t. scaling
    
    [-]
    
    nihalwashere 9 hours ago ago
    
    Distributed rate-limiting is intentionally not in the core library, Reader focuses on the scraping primitives and stays unopinionated about orchestration.
    For multi-node rate limiting, you'd layer that on top: Redis + a simple limiter that gates calls to reader.scrape().
    For k8s resource settings, the Docker guide is a good starting point: https://docs.reader.dev/documentation/guides/deployment
    But I will add some reference examples on how to build a rate-limiting and K8s orchestration layer on top of Reader...
    Thanks for sharing this :)
    
    [-]
    
    verdverm 9 hours ago ago
    
    There's plenty of what you've built here to go around. It's trivial now to reproduce the basics.
    Distributed rate-limiting is a hard problem, one people may pay for
    
    [-]
    
    nihalwashere 9 hours ago ago
    
    I will add some reference examples on how to build a rate-limiting and K8s orchestration layer on top of Reader soon :)
    
    [-]
    
    verdverm 8 hours ago ago
    
    You're missing the point, I don't want to build it myself. The framework I will actually use will do it for me. If yours does not, it will not be in consideration.