[HN Gopher] Show HN: EndType - Extract structured data from imag...
___________________________________________________________________
Show HN: EndType - Extract structured data from images, video and
PDFs
Author : timm37
Score : 11 points
Date : 2024-06-05 10:01 UTC (13 hours ago)
(HTM) web link (endtype.com)
(TXT) w3m dump (endtype.com)
| timm37 wrote:
| Hey everyone. As AI gets better and better and multimodal I
| believe one of the most common use cases will extracting
| structured data from unstructured files. So things like shipping
| labels, bank statements, invoices, patents, etc.
|
| I plan to release workflows soon which will simply take any file
| via email or form and save the structured content on a
| spreadsheet/csv or a new PDF.
|
| Let me know if you would be interested in trying the workflows
| and if you have a use case to extract/organize different files.
| MilStdJunkie wrote:
| I got one. Say I gave it a corpus of structured[1] files that
| follow Schema X, then I gave it a pile of outputs (PDF, HTML)
| generated from that corpus, where StructureFileName.xml =
| StructuredFileName.pdf. Could you see this doohickey being able
| to take in _just_ the PDF /HTML/Word output, then output its
| best guess at chucking that into a Schema X file?
|
| Pretty much everyone I work with are XML fetishists, and adore
| hard coded ontologies and taxonomies forged with many years of
| blood and sweat. I'm a bit more pragmatic and technology-
| minded. Even before AI I was pretty sure that using Python ML
| to generate a graph of keywords was a _hell_ of a lot more
| useful than handcrafted ontology - doesn 't cost hundreds of
| thousands of dollars in billable hours either. Now, with this
| stuff, we can get around hard coding all that structure itself,
| and maybe have source documents that normal people can read
| without about five zeroes worth of bespoke tools.
|
| [1] And when I say "structured" I mean *completely frickin
| bananas".
___________________________________________________________________
(page generated 2024-06-05 23:02 UTC)