Search engine for YouTube content that's no longer on YouTube:
deleted, removed, region-blocked, DMCA'd. ~1.5B videos indexed from 2005
onwards by aggregating archive sources Internet Archive Wayback Machine
(CDX + HEAD-spread discovery), Common Crawl.
What you get for any video ID: metadata (title, description, channel,
upload date, duration, view counts, tags), thumbnails, original captions
when the archive captured them, and reconstructed URLs to play the
archived video file when available. Channel discovery reconciles legacy
username/handle eras to a single canonical identity (lots of channels
renamed themselves a dozen times — that part was painful).
Update: So I mustered the courage to try the search engine, because it was looking not very much like a scam, and it becomes very apparent as soon as you use it that non-deleted videos are also indexed.
Search engine for YouTube content that's no longer on YouTube: deleted, removed, region-blocked, DMCA'd. ~1.5B videos indexed from 2005 onwards by aggregating archive sources Internet Archive Wayback Machine (CDX + HEAD-spread discovery), Common Crawl. What you get for any video ID: metadata (title, description, channel, upload date, duration, view counts, tags), thumbnails, original captions when the archive captured them, and reconstructed URLs to play the archived video file when available. Channel discovery reconciles legacy username/handle eras to a single canonical identity (lots of channels renamed themselves a dozen times — that part was painful).
Seems pretty cool. So this is a recent project, and you haven’t been working on this since 2005 right?
Have you considered also indexing videos that haven’t been deleted?
Update: So I mustered the courage to try the search engine, because it was looking not very much like a scam, and it becomes very apparent as soon as you use it that non-deleted videos are also indexed.