> happy to run additional documents if people want to share examples
I've got one! The pdf of this out-of-print book is terrible: https://archive.org/details/oneononeconversa0000simo. The text is unreadably faint, and the underlying text layer is full of errors, so copy-paste is almost useless. Can your software extract usable text?
(I'll email you a copy of the pdf for convenience since the internet archive's copy is behind their notorious lending wall)
If anyone is interested in the history of the family therapy movement—that is, the movement that started in the 1950s where psychotherapists started working with entire families rather than individual clients—this is a great book of interviews and incredibly readable.
From the chapter above, Jay Haley on Milton Erickson:
But, you know, the real tragedy with Erickson was he spent so much time over the years teaching hypnosis when he had a whole new school of thera- py to offer. People did not recognize the significance of his work until he was too old to really demon- Strate it
(I left in a couple of text glitches there...at least it's readable now!)
Super interesting stuff. I’m a fan - been a pulse customer for a while. However, I’ve found it has trouble with things that need intelligence like quotes meaning to repeat the previous line. Is that something you’re working on or is that not the right use case for pulse?
thanks! we benchmark against all the major players (azure doc intelligence, aws textract, google doc ai, frontier llms, etc). we have some public news coming out soon on this front, but we have a very rigorous dataset using both public and synthetic data focusing on the hardest problems in the space (handwriting, tables, etc).
Hey, congratulations on the launch. Just noticed a discrepancy in the financial 10K example:
There is a section near the start where there are 4 options: Large accelerated filer, Non-accelerated filer, Accelerated filer, or Smaller reporting company.
In this option, "Large accelerated filer" is checked on the PDF, but "Non-accelerated filer" is checked on the Markdown.
can't sign up with gmail or "personal" email addresses? What if I want to evaluate but I am not ready to inundated with sales calls? My 'work' email domain is one that many vendors would love to see in their CRM. I always sign up with disposables first.
I guess I should thank you for saving my time? Plenty of others in this space.
AI models will eventually do this natively. This is one of the ways for models to continue to get better, by doing better OCR and by doing better context extraction.
I am already seeing this trend in the recent releases of the native models (such as Opus 4.5, Gemini 3, and especially Gemini 3 flash).
It's only going to get better from here.
Another thing to note is, there are over 5 startups right now in YC portfolio doing the same thing and going after a similar/overlapping target market if I remember correctly.
yeah models are definitely improving, but we've found even the latest ones still hallucinate and infer text rather than doing pure transcription. we carry out very rigorous benchmarks against all of the frontier models. we think the differentiation is in accuracy on truly messy docs (nested tables, degraded scans, handwriting) and being able to deploy on-prem/vpc for regulated industries.
I agree with the second part in terms of differentiation you mentioned.
That plus the ability to provide customized solutions that stitch together data extraction and business logics such as reconciliations for vendor payments or sales.
I think both these reasons are what's keeping all the OCR based companies going.
My only advice would be to figure out more USPs before native models eat your lunch. Like Nanonets has its own native OCR model.
Has docling improved? I had a bit of a nightmare integrating a docling pipeline earlier this year. Docs said it was VLM-ready, which I spent lots of hours finding out was not true, just to find a relevant github issue which would've saved me a ton of hours :/ allegedly fixed, but wow that burned me bigtime.
our team has tested docling pretty extensively, works well for simpler text-heavy docs without complex layouts, but the moment you introduce tables or multi-column stuff it doesn't maintain layout well.
Congrats on launch! We have been using this for a new feature we are building in our SaaS app. It’s results were better than Datalab from our tests, especially in the handwriting category.
Hi, I'm a founder of Datalab. I'm not trying to take away from the launch (congrats), just wanted to respond to the specific feedback.
I'm glad you found a solution that worked for you, but this is pretty surprising to hear - our new model, chandra, saturates handwriting-heavy benchmarks like this one - https://www.datalab.to/blog/saturating-the-olmocr-benchmark ,and our production models are more performant than OSS.
Did you test some time ago? We've made a bunch of updates in the last couple of months. Happy to issue some credits if you ever want to try again - vik@datalab.to.
we disagree! we've found llms by themselves aren't enough and suffer from pretty big failure modes like hallucination and inferring text rather than pure transcription. we wrote a blog about this [1]. the right approach so far seems to be a hybrid workflow that uses very specific parts of the language model architecture.
I paste screenshots into claude code everyday and it's incredible. As in, I can't believe how good it is. I send a screenshot of console logs, a UI and some HTML elements and it just "gets it".
So saying they "Suck" makes me not take your opinion seriously.
yeah models are definitely improving, but we've found even the latest ones still hallucinate and infer text rather than doing pure transcription. we carry out very rigorous benchmarks against all of the frontier models. we think the differentiation is in accuracy on truly messy docs (nested tables, degraded scans, handwriting) and being able to deploy on-prem/vpc for regulated industries.
This is a hand wavy article that dismisses away VLMs without acknowledging the real world performance everyone is seeing. I think it’d be far more useful if you published an eval.
Having worked in the space I have real doubts about that. Right now Claude and other top models already do a decent job at e.g. "generate OCR from this document". But as mentioned there are serious failure modes, it's non-deterministic, and especially cost-prohibitive at scale.
we're more focused on the core extraction layer itself rather than workflow tooling. we train our own vision models for layout detection, ocr, and table parsing from scratch. the key thing for us is determinism and auditability, so outputs are reproducible run over run, which matters a lot for regulated enterprises.
> happy to run additional documents if people want to share examples
I've got one! The pdf of this out-of-print book is terrible: https://archive.org/details/oneononeconversa0000simo. The text is unreadably faint, and the underlying text layer is full of errors, so copy-paste is almost useless. Can your software extract usable text?
(I'll email you a copy of the pdf for convenience since the internet archive's copy is behind their notorious lending wall)
Results look pretty good (with the exception of one very faint page) - check it out here! https://platform.runpulse.com/dashboard/extractions/public/f...
Thanks!
If anyone is interested in the history of the family therapy movement—that is, the movement that started in the 1950s where psychotherapists started working with entire families rather than individual clients—this is a great book of interviews and incredibly readable.
From the chapter above, Jay Haley on Milton Erickson:
But, you know, the real tragedy with Erickson was he spent so much time over the years teaching hypnosis when he had a whole new school of thera- py to offer. People did not recognize the significance of his work until he was too old to really demon- Strate it
(I left in a couple of text glitches there...at least it's readable now!)
Super interesting stuff. I’m a fan - been a pulse customer for a while. However, I’ve found it has trouble with things that need intelligence like quotes meaning to repeat the previous line. Is that something you’re working on or is that not the right use case for pulse?
Congrats on the launch! You mention that you're SOTA on benchmarks. Can you share your research, or share which benchmark you used?
thanks! we benchmark against all the major players (azure doc intelligence, aws textract, google doc ai, frontier llms, etc). we have some public news coming out soon on this front, but we have a very rigorous dataset using both public and synthetic data focusing on the hardest problems in the space (handwriting, tables, etc).
Hey, congratulations on the launch. Just noticed a discrepancy in the financial 10K example:
There is a section near the start where there are 4 options: Large accelerated filer, Non-accelerated filer, Accelerated filer, or Smaller reporting company.
In this option, "Large accelerated filer" is checked on the PDF, but "Non-accelerated filer" is checked on the Markdown.
thanks for the flag! have pointed this out will be pushing an update here shortly
can't sign up with gmail or "personal" email addresses? What if I want to evaluate but I am not ready to inundated with sales calls? My 'work' email domain is one that many vendors would love to see in their CRM. I always sign up with disposables first.
I guess I should thank you for saving my time? Plenty of others in this space.
AI models will eventually do this natively. This is one of the ways for models to continue to get better, by doing better OCR and by doing better context extraction.
I am already seeing this trend in the recent releases of the native models (such as Opus 4.5, Gemini 3, and especially Gemini 3 flash).
It's only going to get better from here.
Another thing to note is, there are over 5 startups right now in YC portfolio doing the same thing and going after a similar/overlapping target market if I remember correctly.
yeah models are definitely improving, but we've found even the latest ones still hallucinate and infer text rather than doing pure transcription. we carry out very rigorous benchmarks against all of the frontier models. we think the differentiation is in accuracy on truly messy docs (nested tables, degraded scans, handwriting) and being able to deploy on-prem/vpc for regulated industries.
I agree with the second part in terms of differentiation you mentioned.
That plus the ability to provide customized solutions that stitch together data extraction and business logics such as reconciliations for vendor payments or sales.
I think both these reasons are what's keeping all the OCR based companies going.
My only advice would be to figure out more USPs before native models eat your lunch. Like Nanonets has its own native OCR model.
Congrats on the launch.
looks really cool, congrats on the launch! are you guys using something similar to docling[https://github.com/docling-project/docling]?
Has docling improved? I had a bit of a nightmare integrating a docling pipeline earlier this year. Docs said it was VLM-ready, which I spent lots of hours finding out was not true, just to find a relevant github issue which would've saved me a ton of hours :/ allegedly fixed, but wow that burned me bigtime.
our team has tested docling pretty extensively, works well for simpler text-heavy docs without complex layouts, but the moment you introduce tables or multi-column stuff it doesn't maintain layout well.
Congrats on launch! We have been using this for a new feature we are building in our SaaS app. It’s results were better than Datalab from our tests, especially in the handwriting category.
Thanks for testing! Glad the results work well for you
thanks! appreciate the kind words
Hi, I'm a founder of Datalab. I'm not trying to take away from the launch (congrats), just wanted to respond to the specific feedback.
I'm glad you found a solution that worked for you, but this is pretty surprising to hear - our new model, chandra, saturates handwriting-heavy benchmarks like this one - https://www.datalab.to/blog/saturating-the-olmocr-benchmark ,and our production models are more performant than OSS.
Did you test some time ago? We've made a bunch of updates in the last couple of months. Happy to issue some credits if you ever want to try again - vik@datalab.to.
Thanks, Vik. Happy to try the model again. Is BAA available?
Yes, we can sign a BAA!
Congrats on launching. Seems very interesting.
AI models will do all this natively
we disagree! we've found llms by themselves aren't enough and suffer from pretty big failure modes like hallucination and inferring text rather than pure transcription. we wrote a blog about this [1]. the right approach so far seems to be a hybrid workflow that uses very specific parts of the language model architecture.
[1] https://www.runpulse.com/blog/why-llms-suck-at-ocr
> Why LLMs Suck at OCR
I paste screenshots into claude code everyday and it's incredible. As in, I can't believe how good it is. I send a screenshot of console logs, a UI and some HTML elements and it just "gets it".
So saying they "Suck" makes me not take your opinion seriously.
yeah models are definitely improving, but we've found even the latest ones still hallucinate and infer text rather than doing pure transcription. we carry out very rigorous benchmarks against all of the frontier models. we think the differentiation is in accuracy on truly messy docs (nested tables, degraded scans, handwriting) and being able to deploy on-prem/vpc for regulated industries.
they need to convince customers its what they need
This is a hand wavy article that dismisses away VLMs without acknowledging the real world performance everyone is seeing. I think it’d be far more useful if you published an eval.
one or two more model releases, and raw documents passed to claude will beat whatever prompt voodoo you guys are cooking
Having worked in the space I have real doubts about that. Right now Claude and other top models already do a decent job at e.g. "generate OCR from this document". But as mentioned there are serious failure modes, it's non-deterministic, and especially cost-prohibitive at scale.
This is like saying AI models can generate images. But a hyper focused model or platform on image generation will do better (for now)
How is this different from Extend(Also YC)?
we're more focused on the core extraction layer itself rather than workflow tooling. we train our own vision models for layout detection, ocr, and table parsing from scratch. the key thing for us is determinism and auditability, so outputs are reproducible run over run, which matters a lot for regulated enterprises.
Can you increase correctness by giving examples to the model? And key terms or nouns expected?