[HN Gopher] Launch HN: Reducto Studio (YC W24) - Build accurate ...
       ___________________________________________________________________
        
       Launch HN: Reducto Studio (YC W24) - Build accurate document
       pipelines, fast
        
       Hi HN! We're Adit and Raunak, co-founders of Reducto (YC W24,
       https://reducto.ai). Reducto turns unstructured documents (e.g.,
       PDFs, scans, spreadsheets) into structured data. This data can then
       be used for retrieval, passed into LLMs, or used elsewhere
       downstream.  We started Reducto when we realized that so many of
       today's AI applications require good quality data. Everyone knows
       that good inputs lead to better outputs, but 80% of the world's
       data is still trapped inside of things like messy PDFs and
       spreadsheets. Raunak and I launched a really early MVP of parsing
       and extracting from unstructured documents, and were lucky to have
       a lot of interest from technical teams when they realized that the
       accuracy was something they hadn't seen before.  We started by just
       releasing an API for engineers to build with, but over time we
       realized that an accurate API was only part of the puzzle. Our
       customers wanted to be able to easily set up multi step pipelines,
       evaluate and iterate on performance within their use case, and work
       with non-engineering teammates that were also involved in the real
       world document processing flow.  That's why we're launching Reducto
       Studio, a web platform that sits on top of our APIs for users to
       build and iterate on end-to-end document pipelines.  With Studio,
       you can:  - Drop an entire file set and get per-field and per-
       document accuracy scores against your eval data.  - Auto-generate
       and continuously optimize extraction schemas to hit production-
       grade quality fast.  - Save every run, iterate on parse/extract
       configs, and compare results side-by-side.  You can see some
       examples here (https://studio.reducto.ai) or you can watch this
       walkthrough:
       https://www.loom.com/share/b243551741c642c6a594c00353fcecb3.  If
       you'd like to upload your own document you can log in and do so as
       well - we don't make you book a demo or put a payment down to try
       it.  Thanks for reading and checking it out! This is only the first
       step for Studio, so we'd love feedback on anything: UX rough edges
       (we know they're there!), features that would make evaluations
       better for you, hard documents you've had trouble with, or anything
       else about wrangling with unstructured data.
        
       Author : adit_a
       Score  : 54 points
       Date   : 2025-06-23 15:30 UTC (7 hours ago)
        
       | omaerkhan wrote:
       | FYI - https://links.reducto.ai/studio doesn't seem to be
       | working... ERR_TOO_MANY_REDIRECTS
        
         | adit_a wrote:
         | Fixed! Sorry about that
        
         | TimMeade wrote:
         | Still not working here
        
           | adit_a wrote:
           | The direct loom link isn't working for you? Are you seeing
           | the same redirects error?
        
       | weego wrote:
       | I'm not a product fit, but I would like to take a moment to
       | praise the detailed beauty of the design work on the site.
       | 
       | From the typography and layout to the line-work down to how the
       | gradients in the, in fashion, large logotype at the bottom of the
       | footer are tied in by using texture.
       | 
       | Was it in house, or an agency? I'd love to see some more of
       | whoever's work it was
        
         | adit_a wrote:
         | Thank you! We worked with Airfoil for the website :)
        
           | esafak wrote:
           | https://www.airfoil.studio/ presumably
        
             | raunakchowdhuri wrote:
             | yep!
        
         | iyn wrote:
         | Agreed -- came here to say exactly that. I like that this is
         | not yet another tailwind template (nothing wrong with them, I
         | use them all the time) but something with its own identity. I
         | especially love the illustrations/icons. Well done!
        
       | skadamat wrote:
       | Congrats on the launch! How do you guys compare with Datalab with
       | regards to accuracy?
       | 
       | https://www.datalab.to/
        
         | gbertb wrote:
         | I want to know this, too. Lots of these companies are doing the
         | same thing, but leave out benchmarks that include marker
        
         | adit_a wrote:
         | Thanks! We have a lot of respect for the work VikP and his team
         | did on Surya but we haven't benchmarked his newer pipeline so I
         | don't want to make a 1:1 claim.
         | 
         | If you want to do a side by side with your use case we'd be
         | happy to set you up with free trial access.
        
       | jackienotchan wrote:
       | I saw your recent $24M series A and was kind of surprised to only
       | see you launching now, congrats!
       | 
       | YC seems to fund quite many document extraction companies, even
       | within the same batch:
       | 
       | - Pulse (YC W24): https://www.ycombinator.com/companies/pulse-3
       | 
       | - OmniAI (YC W24): https://www.ycombinator.com/companies/omniai
       | 
       | - Extend (YC W23): https://www.ycombinator.com/companies/extend
       | 
       | How do you differentiate from these? And how do you see the space
       | evolving as LLMs commoditize PDF extraction?
        
         | echelon wrote:
         | How do you raise Series A before launch / PMF?
         | 
         | I assume y'all launched before this to select partners? Or
         | perhaps this is a new product on top of the core product?
         | 
         | Congrats! Keep at it!
        
           | adit_a wrote:
           | Thank you!
           | 
           | To clarify, our API was already fully launched and in prod
           | with customers when we raised our series A. This launch is
           | specifically for the platform we're building around the API
           | :)
        
         | adit_a wrote:
         | Thanks! To clarify, we launched our document processing APIs a
         | while ago. This launch is specifically for a new platform we're
         | building around our API based on all of the things our
         | customers previously had to build internally to support their
         | use of Reducto (eval tools, monitoring etc).
         | 
         | Generally speaking, my view on the space is that this was
         | crowded well before LLMs. We've met a lot of the folks that
         | worked on things like drivers for printers to print PDFs in the
         | 1990s, IDP players from the last few decades, and more recent
         | cloud offerings.
         | 
         | The context today is clearly very different than it was in the
         | IDP era though (human process with semi-structured content ->
         | LLMs are going to reason over most human data), and so is the
         | solution space (VLMs are an incredible new tool to help address
         | the problem).
         | 
         | Given that I don't think it's surprising that companies inside
         | and outside of YC have pivoted into offering document
         | processing APIs over the past year. Generally speaking we don't
         | see differentiation in the sense of just feature set since
         | that'll converge over time, and instead primarily focus on
         | accuracy, reliability, and scalability, all 3 of which have a
         | very substantive impact from last mile improvements. I think
         | the best testament I have to that is that the customers we've
         | onboarded are very technical, and as a result are very thorough
         | when choosing the right solution for them. That includes a
         | company wide roll out at one of the 4 biggest tech companies,
         | one of the 3 biggest trading firms, and a big set of AI product
         | teams like Harvey, Rogo, ScaleAI etc.
         | 
         | At the end of the day I don't see VLM improvements as
         | antagonistic to what we're doing. We already use them a lot for
         | things like an agentic OCR (correcting mistakes from our
         | traditional CV pipeline). On some level our customers aren't
         | just choosing us for PDF->markdown, they're onboarding with us
         | because they want to spend more of their time on the things
         | that are downstream from having accurate data, and I expect
         | that there'll be room for us to make that even more true as
         | models improve.
        
         | kbyatnal wrote:
         | Founder of Extend (https://www.extend.ai/) here, it's a great
         | question and thanks for the tag. There definitely are a lot of
         | document processing companies, but it's a large market and more
         | competition is always better for users.
         | 
         | In this case, the Reducto team seems to have cloned us down to
         | the small details [1][2], which is a bit disappointing to see.
         | But imitation is the best form of flattery I suppose! We
         | thought deeply about how to build an ergonomic configuration
         | experience for recursive type definitions (which is deceptively
         | complex), and concluded that a recursive spreadsheet-like
         | experience would be the best form factor (which we shipped over
         | a year ago).
         | 
         | > "How do you see the space evolving as LLMs commoditize PDF
         | extraction?"
         | 
         | Having worked with a ton of startups & F500s, we've seen that
         | there's still a large gap for businesses in going from raw OCR
         | outputs --> document pipelines deployed in prod for mission-
         | critical use cases. LLMs and VLMs aren't magic, and anyone who
         | goes in expecting 100% automation is in for a surprise.
         | 
         | The prompt engineering / schema definition is only the start.
         | You still need to build and label datasets, orchestrate
         | pipelines (classify -> split -> extract), detect uncertainty
         | and correct with human-in-the-loop, fine-tune, and a lot more.
         | You can certainly get close to full automation over time, but
         | it takes time and effort -- and that's where we come in. Our
         | goal is to give AI teams all of that tooling on day 1, so they
         | hit accuracy quickly and focus on the complex downstream post-
         | processing of that data.
         | 
         | [1] https://dub.sh/ojv9b7p
         | 
         | [2] https://dub.sh/X7GFlDd
        
           | wilson090 wrote:
           | I've used instabase before which has had the same UX for
           | years. What about benchmarks between the two on extraction
           | performance?
        
           | adit_a wrote:
           | Hey, we've never used or even attempted to use your platform.
           | Respectfully I think you know that, and that you also know
           | that your team has tried to get access to ours using personal
           | gmail accounts dating back to 2024.
           | 
           | A schema builder with nested array fields has been part of
           | our playground (and nearly every structured extraction
           | solution) for a very long time and is just not something that
           | we even view as a defining part of the platform.
        
             | kbyatnal wrote:
             | Thanks for the reply. Not sure what you're referring to,
             | but I don't believe we've ever copied or taken inspo from
             | you guys on anything -- but please do let me know if you
             | feel otherwise.
             | 
             | It's not a big deal at the end of the day, and excited to
             | see what we can both deliver for customers. congrats on the
             | launch!
        
               | Kiro wrote:
               | Two YC companies openly fighting and accusing each other.
               | Not a good look and I'm surprised that you haven't been
               | reprimanded yet.
        
           | serjester wrote:
           | I'm completely impartial here - seems like there's only so
           | many ways you can design a schema builder?
        
       | bze12 wrote:
       | Nice! I was already considering using reducto api. Will give this
       | a try
        
         | adit_a wrote:
         | Let us know if you have any feedback!
        
       | serjester wrote:
       | Congrats on the launch guys, mobile website seems to be broken
       | though.
        
         | adit_a wrote:
         | Thank you! What's the error you're seeing on mobile?
        
           | serjester wrote:
           | It crashes with "a problem repeatedly occurred". I think
           | there's some sort of infinite loop - fails on both safari and
           | chrome on my iPhone.
        
       | willwjack wrote:
       | This would have saved me so much pain back when I was working on
       | RAG workflows. Great to see.
        
         | adit_a wrote:
         | Would love to help if you end up having any use cases in the
         | future!
        
       | Fraaaank wrote:
       | Why do you only get a data processing agreement when on the
       | enterprise plan? It's a legal requirement for _any_ European
       | company.
        
         | adit_a wrote:
         | We have a default DPA we're willing to sign on all tiers -- the
         | note in the pricing page is meant to refer to custom/redlined
         | DPAs that become complex to manage over time
         | 
         | We'll edit that to make it more clear
        
       | b0a04gl wrote:
       | if reducto leans in fully as the layer that remembers every
       | correction, every edge case, every shift in layout or wording
       | across document versions it starts becoming more than a pipeline.
       | it becomes institutional memory for unstructured data. none of
       | the other players really do that. they extract, maybe evaluate
       | once, then forget.
       | 
       | but the real pain is always in the second and third batch. when
       | formats change subtly. if reducto becomes the system that adapts
       | without you babysitting it, that's where it may win. continuity's
       | the moat imo among the competitors
        
         | raunakchowdhuri wrote:
         | this is exactly where we're going with this! glad you see the
         | vision :)
        
         | adit_a wrote:
         | Yeah, we're extremely excited about the potential of building a
         | flywheel for each individual customer's pipeline.
        
       | c_moscardi wrote:
       | We chatted a few months back -- congrats on launch! Looks like a
       | great UX.
        
         | adit_a wrote:
         | Ah yeah I remember! Great to hear from you and thanks :)
        
       | nicodjimenez wrote:
       | For accurate and easy PDF to Markdown / LaTeX / JSON check out:
       | 
       | https://github.com/mathpix/mpxpy
       | 
       | Disclaimer: I'm the founder. Reducto does cool stuff on post
       | processing (and other input formats), but some people have told
       | me Mathpix is better at just getting data out of PDFs accurately.
        
       | bravesoul2 wrote:
       | Got a nasty doc ill test on you ha ha! Tried to OCR/AI it and it
       | drove me nuts.
        
       | techguy06 wrote:
       | Your landing page looks absolutely beautiful, did you use Framer
       | or any other landing page builder or is it code?
        
         | adit_a wrote:
         | Code!
        
       | rd wrote:
       | Cool product, but also crazy to see someone with my name in the
       | wild. Every Raunak I've ever met has the Ronak spelling.
        
         | raunakchowdhuri wrote:
         | dang no way! we were both in boston too
        
       | jhuguet wrote:
       | Founder of anyformat.ai here, building from Madrid, Spain, with a
       | specific focus on Europe and its unique market and regulation
       | dynamics.
       | 
       | Just want to say how energizing it is to see this space maturing
       | through thoughtful products like Extend and Reducto. Congrats to
       | both for your Series A. I'd also mention GetOmni, as they're
       | doing great work leading the open-source front with their ZeroX
       | project. We've learned a lot by observing your execution, and
       | frankly, anyone serious about document intelligence tracks this
       | ecosystem closely. It's been encouraging to see ideas we were
       | exploring early last year reflected in your recent successes. No
       | shame there; good ideas often converge over time.
       | 
       | When we started fundraising (previous to GPT-4o), few investors
       | believed LLMs would meaningfully disrupt this space. Finding the
       | right supporters meant enduring a lot of rejection and delayed us
       | quite a bit. Raising is always hard, and especially in Spain,
       | where even a modest EUR500K pre-seed round typically requires
       | proven MRR in the order of EUR10K.
       | 
       | We're earlier-stage, but strongly aligned in product philosophy.
       | Especially in the belief that the challenge isn't just parsing
       | PDFs. It's building a feedback loop so fast and intuitive that
       | deploying new workflows feels like development, not consulting.
       | That's what enables no-code teams to actually own automation.
       | 
       | From our experience in Europe, the market feels slower. Legacy
       | tools like Textract still hold surprising inertia, and even
       | EUR0.04/page can trigger pushback, signaling deeper friction tied
       | to organizational change. Curious if US-based teams see the same,
       | or whether pricing and adoption are more elastic. We've also
       | heard "we'll build this internally in 3 weeks" more times than we
       | can count--usually underestimating what it takes to scale AI-
       | based workflows reliably.
       | 
       | One experiment we're excited about is using AI agents to ease the
       | "blank page" problem in workflow design. You type: "Given a
       | document, split it into subdocuments (contract, ID, bank account
       | proof), extract key fields, and export everything into Excel."
       | The agent drafts the initial pipeline automatically. It helps
       | DocOps teams skip the fiddly config and get straight to value.
       | Again, no magic--just about removing friction and surfacing
       | intent.
       | 
       | Some broader observations that align with what others here have
       | said:
       | 
       | - Parsing/extraction isn't a long-term moat. Foundation models
       | keep improving and are beginning to yield bounding boxes. Not
       | perfect yet, but close. - Moats come from orchestration-first
       | strategies and self-adaptive systems: rapid iteration,
       | versioning, observability, and agent-assisted configuration using
       | visual tools like ReactFlow or Langflow. Basically, making an
       | easier life to the pipeline owner. - Prompt-tuning (via DSPY,
       | human feedback, QA) holds promise for adaptability but is still
       | hard to expose through intuitive UX--especially for semi-
       | technical DocOps users without ML knowledge. - Extraction
       | confidence remains a challenge. No method fully prevents
       | hallucinations. We shared our mitigation approach here:
       | http://bit.ly/3T5nB3h. OCR errors are a major contributor--we've
       | seen extractions marked high-confidence despite poor OCR input.
       | The extraction logic was right, but we failed to penalize for OCR
       | confidence (we're fixing that). -Excel files are still a
       | nightmare. We're experimenting with methods like this one
       | (https://arxiv.org/html/2407.09025v1), but large, messy files
       | (90+ tabs, 100K+ rows) still break most approaches.
       | 
       | I'd love to connect with other founders in this space.
       | Competition is energizing, and the market is big enough for
       | multiple winners. You guys, along with llamaparse, are
       | spearheding from what I see the movement. Also, incumbents are
       | moving fast. Like Snowflake + Landing AI partnership, but
       | fragmentation is probably inevitable. Feels like the space will
       | stratify fast, some will vanish, some will thrive quietly, and a
       | few might become the core infrastructure layer.
       | 
       | We're small, building hard, and proud to be part of this wave.
       | Kudos again to @kbyatnal and @adit_a for raising the bar, would
       | be great to chat anytime or even offer some workspace if you ever
       | visit Spain!
        
         | adit_a wrote:
         | Appreciate the thoughtful note and want to wish you guys the
         | best as well!
        
       ___________________________________________________________________
       (page generated 2025-06-23 23:00 UTC)