More

raunakchowdhuri · 2025-11-24T17:00:58 1764003658

Have a slack channel with them, these are the versions they mentioned: posthog-node 4.18.1 posthog-js 1.297.3 posthog-react-native 4.11.1 posthog-docusaurus 2.0.6

raunakchowdhuri · 2025-06-24T01:03:22 1750727002

We're fixing it! This for some reason happens on only _some_ phones in our office so was hard to repro. I think has to do with Safari rendering. Will tone down our WebGPU usage

raunakchowdhuri · 2025-06-23T21:32:49 1750714369

dang no way! we were both in boston too

raunakchowdhuri · 2025-06-23T18:41:31 1750704091

this is exactly where we're going with this! glad you see the vision :)

raunakchowdhuri · 2025-06-23T16:42:11 1750696931

raunakchowdhuri · 2025-03-07T03:54:06 1741319646

We ran some benchmarks comparing against Gemini Flash 2.0. You can find the full writeup here: https://reducto.ai/blog/lvm-ocr-accuracy-mistral-gemini

A high level summary is that while this is an impressive model, it underperforms even current SOTA VLMs on document parsing and has a tendency to hallucinate with OCR, table structure, and drop content.

shrisukhani · 2025-03-07T09:43:01 1741340581

Anecdotally, we also found Gemini Flash to be better.

hackernewds · 2025-03-07T06:12:23 1741327943

meanwhile, you're comparing it to the output of almost a trillion dollar company

stann · 2025-03-07T07:43:56 1741333436

The tagline boasts that it is "introducing the world’s best document understanding API". So, holding them to their marketing seems fair

neuronic · 2025-03-07T07:47:45 1741333665

Isn't anyone who releases anything putting "the world's best blablabla" on their page nowadays? I've become entirely blind to it.

dwedge · 2025-03-07T10:05:26 1741341926

If they put it, and it's subpar, I write off the product.

HaZeust · 2025-03-07T06:27:40 1741328860

... And? We're judging it for the merits of the technology it purports to be, not the pockets of the people that bankroll them. Probably not fair - sure, but when I pick my OCR, I want to pick SOTA. These comparisons and announcements help me find those.

raunakchowdhuri · 2025-03-07T08:00:10 1741334410

comparisons to more outputs coming soon!

raunakchowdhuri · 2025-02-05T20:02:14 1738785734

CTO of Reducto here. Love this writeup!

We’ve generally found that Gemini 2.0 is a great model and have tested this (and nearly every VLM) very extensively.

A big part of our research focus is incorporating the best of what new VLMs offer without losing the benefits and reliability of traditional CV models. A simple example of this is we’ve found bounding box based attribution to be a non-negotiable for many of our current customers. Citing the specific region in a document where an answer came from becomes (in our opinion) even MORE important when using large vision models in the loop, as there is a continued risk of hallucination.

Whether that matters in your product is ultimately use case dependent, but the more important challenge for us has been reliability in outputs. RD-TableBench currently uses a single table image on a page, but when testing with real world dense pages we find that VLMs deviate more. Sometimes that involves minor edits (summarizing a sentence but preserving meaning), but sometimes it’s a more serious case such as hallucinating large sets of content.

The more extreme case is that internally we fine tuned a version of Gemini 1.5 along with base Gemini 2.0, specifically for checkbox extraction. We found that even with a broad distribution of checkbox data we couldn’t prevent frequent checkbox hallucination on both the flash (+17% error rate) and pro model (+8% error rate). Our customers in industries like healthcare expect us to get it right, out of the box, deterministically, and our team’s directive is to get as close as we can to that ideal state.

We think that the ideal state involves a combination of the two. The flexibility that VLMs provide, for example with cases like handwriting, is what I think will make it possible to go from 80 or 90 percent accuracy to some number very close 99%. I should note that the Reducto performance for table extraction is with our pre-VLM table parsing pipeline, and we’ll have more to share in terms of updates there soon. For now, our focus is entirely on the performance frontier (though we do scale costs down with volume). In the longer term as inference becomes more efficient we want to move the needle on cost as well.

Overall though, I’m very excited about the progress here.

--- One small comment on your footnote, the evaluation script with Needlemen-Wunsch algorithm doesn’t actually consider the headers outputted by the models and looks only at the table structure itself.

noja · 2025-02-05T20:13:41 1738786421

> deterministically

How are you planning to do this?

raunakchowdhuri · 2025-02-05T19:32:09 1738783929

would encourage you to take a look at some of the real data here! https://huggingface.co/spaces/reducto/rd_table_bench

you'll find that most of the errors here are structural issues with the table or inability to parse some special characters. tables can get crazy!

raunakchowdhuri · on Nov 5, 2024

Love the Pubtables work! It's a really useful dataset. Their data comes from existing annotations from scientific papers, so in our experience it doesn't include a lot of the hardest cases that a lot of methods fail at today. The annotations are computer generated instead of manually labeled, so you don't have things like scanned and rotated images or a lot of diversity in languages.

I'd encourage you to take a look at some of our data points to compare for yourself! Link: huggingface.co/spaces/reducto/rd_table_bench

In terms of the overall importance of table extraction, we've found it to be a key bottleneck for folks looking to do document parsing. It's up there amongst the hardest problems in the space alongside complex form region parsing. I don't have the exact statistics handy, but I'd estimate that ~25% of the pages we parse have some hairy tables in them!

raunakchowdhuri · on Aug 26, 2024

hmmm idk how I would feel about giving an llm cluster access from a security pov

wilson090 · on Aug 26, 2024

Valid concern, security and safety are essential for anything that can access a production system. We use k8s RBAC to ensure that the access is read-only, so even if the LLM hallucinates and tries to destroy something, it can't

As we will eventually move towards write-access, we're closely following the work in LLM safety. There has been some interesting work to use smaller models to evaluate tool calls/completions against a set of criteria to ensure safety

martinald · on Aug 26, 2024

Other problem is that you become an extremely big target for bad actors as you have read/write (or just even read) access to all these k8s clusters. Obviously you can mitigate against that to a fairly high degree with on prem, but for users not on that...

Cool idea though!