[HN Gopher] Show HN: EndType - Extract structured data from imag...
       ___________________________________________________________________
        
       Show HN: EndType - Extract structured data from images, video and
       PDFs
        
       Author : timm37
       Score  : 11 points
       Date   : 2024-06-05 10:01 UTC (13 hours ago)
        
 (HTM) web link (endtype.com)
 (TXT) w3m dump (endtype.com)
        
       | timm37 wrote:
       | Hey everyone. As AI gets better and better and multimodal I
       | believe one of the most common use cases will extracting
       | structured data from unstructured files. So things like shipping
       | labels, bank statements, invoices, patents, etc.
       | 
       | I plan to release workflows soon which will simply take any file
       | via email or form and save the structured content on a
       | spreadsheet/csv or a new PDF.
       | 
       | Let me know if you would be interested in trying the workflows
       | and if you have a use case to extract/organize different files.
        
         | MilStdJunkie wrote:
         | I got one. Say I gave it a corpus of structured[1] files that
         | follow Schema X, then I gave it a pile of outputs (PDF, HTML)
         | generated from that corpus, where StructureFileName.xml =
         | StructuredFileName.pdf. Could you see this doohickey being able
         | to take in _just_ the PDF /HTML/Word output, then output its
         | best guess at chucking that into a Schema X file?
         | 
         | Pretty much everyone I work with are XML fetishists, and adore
         | hard coded ontologies and taxonomies forged with many years of
         | blood and sweat. I'm a bit more pragmatic and technology-
         | minded. Even before AI I was pretty sure that using Python ML
         | to generate a graph of keywords was a _hell_ of a lot more
         | useful than handcrafted ontology - doesn 't cost hundreds of
         | thousands of dollars in billable hours either. Now, with this
         | stuff, we can get around hard coding all that structure itself,
         | and maybe have source documents that normal people can read
         | without about five zeroes worth of bespoke tools.
         | 
         | [1] And when I say "structured" I mean *completely frickin
         | bananas".
        
       ___________________________________________________________________
       (page generated 2024-06-05 23:02 UTC)