Astro - Hacker News

16 comments

minimaxir 30 minutes ago ago

As someone who has spent an embarrassing amount of time researching Hacker News title trends over the years, I was excited to look at the methodology (https://hn-ph.vercel.app/analysis) but after looking at it, I am calling shenanigans afoot.
That's not a methodology paper and it doesn't explain how the model being advertised works in the spirit of open machine learning research; given that the startup is an AI startup, I assume that the actual model is more sophisticated. As Section 8 notes: "This analysis is descriptive and intended to summarize empirical patterns."
It's an exploratory data analysis which not only does not explain the methodology around how the model is constructed, but it also makes a number of assumptions that imply the people making it without proper context of how Hacker News works:
1. The extreme right-skewed nature should have raised a very large number of flags in the statistical methodology and calculations, but it mostly ignores them. The mean values are effectively useless, the p-values even more useless. It doesn't point out that the negative performing terms are likely spam.
2. It does not question why there are so few questions with a title >80 characters (answer: 80 characters is the max for a HN submission)
3. The analysis separates day of the week and hour: you can't do that. They're intrinsically linked and weekend behavior with respect to activity is far different than on weekdays.
4. "Title length has a weak relationship with score (Pearson r = -0.017, Spearman r = 0.048, n = 100k)". No statistician would call that a weak correlation; those values are effectively no correlation.
There is also no person tied to this paper, just the "Memvid Research Team", which raises further questions.
[-]
- leohonexus 21 minutes ago ago
  
  Agreed. And titling the paper "Attention is all you need" shows the author's hubris.
  I think it would have been much more appreciated as a dataset paper (and titled accordingly), rather than a "viral potential predictor".
delichon 2 hours ago ago

Here are the result for this username, this title and this description:
https://hn-ph.vercel.app/results/ZT06GF
It got a 62, a C+, predicting that this won't be very viral. So you either didn't test this submission on your own product, or you did, but didn't feel that the low score was a handicap? You don't seem to be dogfooding. If this post does well it would be evidence against its own accuracy. If it fizzles out, congratulations on being correct.
[-]
- baobun an hour ago ago
  
  Uncharitable and assumptious of the goals. I prefer submissions to not be hyper-optimized for virality.
tverbeure an hour ago ago

Current nr 3 in the leaderboard: "Show HN: I built a Rust compiler in Rust with Rust"
Could use some more Rust to boost it to nr 1.
[-]
- Frotag 10 minutes ago ago
  
  Show HN: I built a Rust compiler in Python with JavaScript using Java on Android
- baobun an hour ago ago
  
  I'm calling it: Some AI controversy in Rust core will be in the top 5 of 2026.
higginsniggins 25 minutes ago ago

According to your research paper you should have made this post a "Tell HN:" rather then a "Show HN:", lol
andr3wV an hour ago ago

The analysis they ran in their research paper found most surface features don’t meaningfully separate viral from non‑viral outcomes. So the tool isn't actually predicting if your launch title will go viral, it's more like checking for heuristics and descriptive patterns.
Cool idea though! And they're on the front page lol
amitav1 an hour ago ago

This tool: "Avoid keyword stuffing; make the title read naturally."
Also this tool: "Show HN (AI): I built GPT 6 in Rust Using Claude Gemini Grok OpenAI NVIDIA Google" - #1
(No hate to the creators obviously. Just really funny.)
codybontecou an hour ago ago

Well, he made it to the front page so there’s that.
simonw an hour ago ago

(Replaced my original comment here which was a little unkind.)
Question for OP, who created Memvid (the .mv2 file format that's used to distribute this data). Are you still taking text, chunking it and then storing those chunks as QR codes in a video file? That seems like an inherently inefficient storage mechanism to me compared with something like SQLite or Parquet - do you have concrete numbers or a demo that shows that your file format really is more effective for storing data for "AI agents" than those existing solutions?
[-]
- minimaxir 6 minutes ago ago
  
  As a side note: the dataset is referenced in the paper as being from Hugging Face (https://huggingface.co/datasets/julien040/hacker-news-posts), which does host it as a 426 MB Parquet, while the .mv2 being distributed is 847 MB, for some reason.
- tossit444 20 minutes ago ago
  
  Look at memvid's closed issues. The entire thing is a farce.
  https://github.com/memvid/memvid/issues?q=is%3Aissue%20state...
mitexleo 2 hours ago ago

Let's see if this goes viral
[-]
- asciii 2 hours ago ago
  
  o7 see you in the 1% someday