The voiceover in the promo video on this page seems to be AI generated, with some weird artifacts. Right at the beginning it sounds like it says "cormbiying structure daya retrieval and lirrachure search".
Is it me or they very carefully do not report performance on GPT-5.4 Pro, only the default GPT-5.4? They also very carefully left Anthropic models out of their comparison.
I went back to the BixBench benchmark which they mentioned. I couldn't find official results for Anthropic models, but I found a project taking Opus 4.6 from 65.3% to 92.0% (which would be above GPT-Rosalind) with nearly 200 carefully crafted skills [1]. There also appears to be competitive competitor models with scores on par with this tuned GPT.
Bix Bench seems like a really interesting/useful idea but most of the value for a layperson (like me) is comparing the results of different models on the benchmark. From what I can find there is no centralised & updated model results set. Shame.
> GPT‑Rosalind is now available … for qualified customers …
It’s kind of gross to make money off her name (if that’s what’s happening) posthumously. It’s a complicated story anyway. IIRC her sister referred to it as “the Cult of Rosalind” when people were cashing in on books about her.
I'd rather the AI companies make up names, or name their products things like "Clod" than use my name (if they were to ask) - as no matter how good it looks today eventually it'll be some form of laughingstock.
The voiceover in the promo video on this page seems to be AI generated, with some weird artifacts. Right at the beginning it sounds like it says "cormbiying structure daya retrieval and lirrachure search".
Is it me or they very carefully do not report performance on GPT-5.4 Pro, only the default GPT-5.4? They also very carefully left Anthropic models out of their comparison.
I went back to the BixBench benchmark which they mentioned. I couldn't find official results for Anthropic models, but I found a project taking Opus 4.6 from 65.3% to 92.0% (which would be above GPT-Rosalind) with nearly 200 carefully crafted skills [1]. There also appears to be competitive competitor models with scores on par with this tuned GPT.
[1] https://github.com/jaechang-hits/SciAgent-Skills
Bix Bench seems like a really interesting/useful idea but most of the value for a layperson (like me) is comparing the results of different models on the benchmark. From what I can find there is no centralised & updated model results set. Shame.
I'm all for naming things in honor of Rosalind Franklin, but this seems like incredible misplaced hubris instead.
> GPT‑Rosalind is now available … for qualified customers …
It’s kind of gross to make money off her name (if that’s what’s happening) posthumously. It’s a complicated story anyway. IIRC her sister referred to it as “the Cult of Rosalind” when people were cashing in on books about her.
I'd rather the AI companies make up names, or name their products things like "Clod" than use my name (if they were to ask) - as no matter how good it looks today eventually it'll be some form of laughingstock.