Post AcPTo5cID0tQolvzsG by vmbrasseur@social.vmbrasseur.com
 (DIR) More posts by vmbrasseur@social.vmbrasseur.com
 (DIR) Post #AcNWOdQVnDJ9h93f28 by kfogel@kfogel.org
       2023-12-01T21:43:51.557151Z
       
       0 likes, 0 repeats
       
       So... there are OCR+ML services these days that can ingest a PDF and output a DOCX or ODT file, riiiiiight?Please tell me it's true.  I don't want to convert this built-from-LaTeX PDF by hand.  (Sadly, it's not practical to convert directly from LaTeX because LaTeX doesn't really ever give you access to an AST (abstract syntax tree) of the document to work with.)No reason to hide the ball here: the document is https://code.librehq.com/ots/dosp-research/-/blob/main/dosp-survey.ltx FWIW.
       
 (DIR) Post #AcPTWZwygs9ngRGBLE by markphip@hachyderm.io
       2023-12-01T21:50:22Z
       
       0 likes, 0 repeats
       
       @kfogel I also see this https://github.com/jay-dennis/tex2docx
       
 (DIR) Post #AcPTWaz8qJTwtRHQLw by kfogel@kfogel.org
       2023-12-02T20:21:06.420606Z
       
       0 likes, 0 repeats
       
       @markphip Thanks for this!  Gonna try it.
       
 (DIR) Post #AcPTo5cID0tQolvzsG by vmbrasseur@social.vmbrasseur.com
       2023-12-01T21:53:06Z
       
       0 likes, 0 repeats
       
       @kfogel Have you already tried Pandoc? https://pandoc.org
       
 (DIR) Post #AcPTo85R1dOOTjaIZk by kfogel@kfogel.org
       2023-12-02T20:24:16.871378Z
       
       0 likes, 0 repeats
       
       @vmbrasseur Yes, tried pandoc, thanks!  It's great for certain combinations of input and output, but PDF as input is not its strong suit.  This is not because pandoc isn't good, but because the problem is inherently hard (Adobe has devoted a ton of effort to making their proprietary online converter work, and even it produces only so-so output).  /CC @jarhill0
       
 (DIR) Post #AcPTrzHPi7z4K61jvc by markphip@hachyderm.io
       2023-12-01T21:48:08Z
       
       0 likes, 0 repeats
       
       @kfogel have you tried any of the online LaTeX to docx converters?
       
 (DIR) Post #AcPTsDG9jbIRgFXvua by kfogel@kfogel.org
       2023-12-02T20:25:02.654766Z
       
       0 likes, 0 repeats
       
       @markphip I tried Adobe's and got output that will need non-trivial further massaging (it was still impressive how well the thing did, though, IMHO).
       
 (DIR) Post #AcVMcgiEvaGLfB5l2W by jarhill0@hachyderm.io
       2023-12-02T21:36:43Z
       
       0 likes, 0 repeats
       
       @kfogel i suggested Pandoc because it accepts LaTeX as input. Do you have access to the LaTeX source?
       
 (DIR) Post #AcVMchnEuTr90yRGTI by kfogel@kfogel.org
       2023-12-05T16:31:59.980334Z
       
       0 likes, 0 repeats
       
       @jarhill0 Oh, we do have access to the LaTeX source, yes.  I've done pandoc conversions from LaTeX before, and it's not bad -- that is, it gets you a good part of the way there, but manual fix-up is still necessary (at least in my experience).