As someone who has spent an embarrassing amount of time researching Hacker News title trends over the years, I was excited to look at the methodology (https://hn-ph.vercel.app/analysis) but after looking at it, I am calling shenanigans afoot.
That's not a methodology paper and it doesn't explain how the model being advertised works in the spirit of open machine learning research; given that the startup is an AI startup, I assume that the actual model is more sophisticated. As Section 8 notes: "This analysis is descriptive and intended to summarize empirical patterns."
It's an exploratory data analysis which not only does not explain the methodology around how the model is constructed, but it also makes a number of assumptions that imply the people making it without proper context of how Hacker News works:
1. The extreme right-skewed nature should have raised a very large number of flags in the statistical methodology and calculations, but it mostly ignores them. The mean values are effectively useless, the p-values even more useless. It doesn't point out that the negative performing terms are likely spam.
2. It does not question why there are so few questions with a title >80 characters (answer: 80 characters is the max for a HN submission)
3. The analysis separates day of the week and hour: you can't do that. They're intrinsically linked and weekend behavior with respect to activity is far different than on weekdays.
4. "Title length has a weak relationship with score (Pearson r = -0.017, Spearman r = 0.048, n = 100k)". No statistician would call that a weak correlation; those values are effectively no correlation.
There is also no person tied to this paper, just the "Memvid Research Team", which raises further questions.
It got a 62, a C+, predicting that this won't be very viral. So you either didn't test this submission on your own product, or you did, but didn't feel that the low score was a handicap? You don't seem to be dogfooding. If this post does well it would be evidence against its own accuracy. If it fizzles out, congratulations on being correct.
The analysis they ran in their research paper found most surface features don’t meaningfully separate viral from non‑viral outcomes. So the tool isn't actually predicting if your launch title will go viral, it's more like checking for heuristics and descriptive patterns.
Cool idea though! And they're on the front page lol
(Replaced my original comment here which was a little unkind.)
Question for OP, who created Memvid (the .mv2 file format that's used to distribute this data). Are you still taking text, chunking it and then storing those chunks as QR codes in a video file? That seems like an inherently inefficient storage mechanism to me compared with something like SQLite or Parquet - do you have concrete numbers or a demo that shows that your file format really is more effective for storing data for "AI agents" than those existing solutions?
As a side note: the dataset is referenced in the paper as being from Hugging Face (https://huggingface.co/datasets/julien040/hacker-news-posts), which does host it as a 426 MB Parquet, while the .mv2 being distributed is 847 MB, for some reason.
As someone who has spent an embarrassing amount of time researching Hacker News title trends over the years, I was excited to look at the methodology (https://hn-ph.vercel.app/analysis) but after looking at it, I am calling shenanigans afoot.
That's not a methodology paper and it doesn't explain how the model being advertised works in the spirit of open machine learning research; given that the startup is an AI startup, I assume that the actual model is more sophisticated. As Section 8 notes: "This analysis is descriptive and intended to summarize empirical patterns."
It's an exploratory data analysis which not only does not explain the methodology around how the model is constructed, but it also makes a number of assumptions that imply the people making it without proper context of how Hacker News works:
1. The extreme right-skewed nature should have raised a very large number of flags in the statistical methodology and calculations, but it mostly ignores them. The mean values are effectively useless, the p-values even more useless. It doesn't point out that the negative performing terms are likely spam.
2. It does not question why there are so few questions with a title >80 characters (answer: 80 characters is the max for a HN submission)
3. The analysis separates day of the week and hour: you can't do that. They're intrinsically linked and weekend behavior with respect to activity is far different than on weekdays.
4. "Title length has a weak relationship with score (Pearson r = -0.017, Spearman r = 0.048, n = 100k)". No statistician would call that a weak correlation; those values are effectively no correlation.
There is also no person tied to this paper, just the "Memvid Research Team", which raises further questions.
Agreed. And titling the paper "Attention is all you need" shows the author's hubris.
I think it would have been much more appreciated as a dataset paper (and titled accordingly), rather than a "viral potential predictor".
Here are the result for this username, this title and this description:
https://hn-ph.vercel.app/results/ZT06GF
It got a 62, a C+, predicting that this won't be very viral. So you either didn't test this submission on your own product, or you did, but didn't feel that the low score was a handicap? You don't seem to be dogfooding. If this post does well it would be evidence against its own accuracy. If it fizzles out, congratulations on being correct.
Uncharitable and assumptious of the goals. I prefer submissions to not be hyper-optimized for virality.
Current nr 3 in the leaderboard: "Show HN: I built a Rust compiler in Rust with Rust"
Could use some more Rust to boost it to nr 1.
Show HN: I built a Rust compiler in Python with JavaScript using Java on Android
I'm calling it: Some AI controversy in Rust core will be in the top 5 of 2026.
According to your research paper you should have made this post a "Tell HN:" rather then a "Show HN:", lol
The analysis they ran in their research paper found most surface features don’t meaningfully separate viral from non‑viral outcomes. So the tool isn't actually predicting if your launch title will go viral, it's more like checking for heuristics and descriptive patterns.
Cool idea though! And they're on the front page lol
This tool: "Avoid keyword stuffing; make the title read naturally."
Also this tool: "Show HN (AI): I built GPT 6 in Rust Using Claude Gemini Grok OpenAI NVIDIA Google" - #1
(No hate to the creators obviously. Just really funny.)
Well, he made it to the front page so there’s that.
(Replaced my original comment here which was a little unkind.)
Question for OP, who created Memvid (the .mv2 file format that's used to distribute this data). Are you still taking text, chunking it and then storing those chunks as QR codes in a video file? That seems like an inherently inefficient storage mechanism to me compared with something like SQLite or Parquet - do you have concrete numbers or a demo that shows that your file format really is more effective for storing data for "AI agents" than those existing solutions?
As a side note: the dataset is referenced in the paper as being from Hugging Face (https://huggingface.co/datasets/julien040/hacker-news-posts), which does host it as a 426 MB Parquet, while the .mv2 being distributed is 847 MB, for some reason.
Look at memvid's closed issues. The entire thing is a farce.
https://github.com/memvid/memvid/issues?q=is%3Aissue%20state...
Let's see if this goes viral
o7 see you in the 1% someday