[HN Gopher] Pandoc 3.0
___________________________________________________________________
Pandoc 3.0
Author : zczc
Score : 468 points
Date : 2023-01-19 08:45 UTC (14 hours ago)
(HTM) web link (pandoc.org)
(TXT) w3m dump (pandoc.org)
| asicsp wrote:
| I use pandoc to convert GitHub style markdown to PDF/EPUB ebooks.
| The default output is good and there are plenty of customization
| options too. I didn't know LaTeX/CSS but stitched a few things
| together with help from stackexchange sites to customize the
| output produced. Later came to know there are third-party
| templates that I could've used/started with.
| amai wrote:
| Since Pandoc has Lua inbuilt I wonder if it can also run LuaLatex
| in full? Because then it could support really all features of
| LaTeX and become a kind of SuperLaTeX.
| zdw wrote:
| Does it still automatically generate "smart" quotes (which are
| anything but) from traditional ones during conversion?
|
| Love the tool, but this is the most awful default setting I've
| seen in a program in a while, especially if you include any code
| that depends on quotes not being mangled.
| simonmic wrote:
| This I agree with. I don't know the exact current status, but
| having debugged related rendering issues many times over the
| years, I wish it had always been hard to enable conversion to
| so-called smart quotes, rather than hard to prevent it.
| simonmic wrote:
| I should add: the above is about the only quibble I can think
| of, which is impressive. I love love love pandoc! It's a
| highly dependable and capable swiss army docs tool. I use it
| constantly, eg to help generate CLI help text and HTML, man,
| info and plain text manuals from (mostly) markdown sources.
| Huge congrats and thanks to the developers for their hard
| work and for this latest release.
| johnday wrote:
| If you specify that you just want bog-standard markdown then it
| will not generate smart quotes. To wit:
|
| - `pandoc foo.tex -t markdown foo.md` will not produce smart
| quotes.
|
| - `pandoc foo.tex -t markdown-smart foo.md` will produce smart
| quotes.
| zdw wrote:
| I think the examples you give might be the opposite of what
| happens - in the docs:
|
| https://pandoc.org/MANUAL.html#extension-smart
|
| The meaning of the -smart extension on option names is
| inverted in some cases, and enabled by default on markdown
| output.
| leephillips wrote:
| It's configurable, and, in any case, Pandoc will not alter the
| quotes in your code blocks nor in inline code.
| kelsolaar wrote:
| Quarto, excellent software for building publications, websites,
| etc..., is leveraging Pandoc: https://quarto.org/
| bronikowski wrote:
| I love Pandoc. I don't often write "documents for office
| consumption" but when I do, I just write a markdown file and spit
| out docx or PDF. I was congratulated more than once on how
| coherent my documents are in their structure.
|
| Plus, having a git history is a great boon.
| yabones wrote:
| It's also not too difficult to hook up a GH actions job to
| generate the documents with pandoc and spit them out directly
| into dropbox/sharepoint for "non-techie consumption". Great for
| semi-technical documentation that bis/sales/support people need
| to be in the loop on.
| qbasic_forever wrote:
| Oh that's a great idea! I wish I had pandoc in my university
| days--I ended up writing a lot of (non-technical) papers in
| latex just because I hated using word for the task.
| snet0 wrote:
| I've been using Pandoc to write Latex-lite for a couple years
| now. Just write .md files with basic Markdown syntax for all the
| major text content, and add some Latex when I need to do
| something more particular. Best of both worlds, really.
| jszymborski wrote:
| Man I'd love an asciidoc(tor) reader for pandoc sooo much. The
| existing toolchain is a big pain.
| jph wrote:
| Pandoc is such a great conversion program, and this new 3.0
| release has so many improvements, especially for figures.
|
| I write in markdown and export to PDF and using pygments for code
| syntax coloring, with .tex files to adjust layouts, tables, and
| the like.
|
| https://github.com/SixArm/pandoc-from-markdown-to-pdf
| toastal wrote:
| One day I wish to see the AsciiDoc(tor) Reader. I'd love to be
| freed from Ruby as AsciiDoc is superior to Markdown and most
| other lightweight markup syntax options in features and syntax.
| This lack of features is why we have an incompatible group of
| Markdown syntax forks (aka "flavors" to mask that forks are
| incompatible).
| geokon wrote:
| I've successfully used it from Clojure (I think it's through
| JRuby). With a few lines of code you can configure AsciiDoctor
| to whatever you need. It's way easier than fiddling at the
| command-line (I couldn't immediately understand how to get
| extensions and how it played with whatever version of the
| software I got through `apt`). It'd be good to have
| alternatives just for the sake of it - but I didn't find
| anything particularly lacking
|
| Here is how I made some reveal slides (import
| [org.asciidoctor Asciidoctor
| OptionsBuilder SafeMode]) (let
| [input-file (clojure.java.io/file
| "path/to/adoc/file") adoctor
| (org.asciidoctor.Asciidoctor$Factory/create)
| reveal-option (doto
| (org.asciidoctor.OptionsBuilder/options)
| (.backend "revealjs")
| (.safe
| org.asciidoctor.SafeMode/UNSAFE)
| (.attributes (.attribute
| (org.asciidoctor.AttributesBuilder/attributes)
| "revealjsdir"
| "../reveal.js")))] (.requireLibrary adoctor
| (into-array String ["asciidoctor-
| revealjs"])) (.convertFile adoctor
| input-file reveal-option))
|
| You get all the codez from Maven so you don't need to install
| anything on your system
| {'org.asciidoctor/asciidoctorj-revealjs {:mvn/version
| "5.0.0.rc1"} 'org.asciidoctor/asciidoctorj-pdf
| {:mvn/version "1.6.2"} 'org.asciidoctor/asciidoctorj
| {:mvn/version "2.5.3"}
|
| The maintainers seem very responsive and active on Github. It's
| not as nice as a spec and multiple implementations - and I
| guess you're locked in to one library, but at least it's not as
| bad as Orgmode - where you're locked in to an editor as well
| toastal wrote:
| Yes, the maintainer is great. I have at this point just used
| Nix and post-processed Asciidoctor instead of trying any sort
| of other tools, but it gets tricker as you noted if you want
| to use it inside something else. It's not a compiled binary
| nor is it a C lib other languages could get at. Much of that
| could be attributed to the spec being quite.
| matklad wrote:
| Take a look at https://djot.net/, you might like that.
| toastal wrote:
| It does do some things better and I appreciate calling it a
| new name and 'starting over' instead of another fork, but
| what's not covered is metadata. If I want to add author,
| license, tags, keywords, description, etc. there is no in-
| document way to do this. Almost all other media format types
| from images, audio, to other documents like ODF have a way to
| do metadata and this (and Markdown) doesn't cover said
| important use case.
|
| Seems there's a long-standing (for the project) open issue
| where it's still being mulled over.
|
| Imports are also very nice for writing longer texts--
| especially how AsciiDoc lets you +1 all of your headings so
| the heading hierarchy works as a standalone document and a
| part of a larger whole.
| matklad wrote:
| Yeah, it's definitely a wip at this point, though it is
| already general enough to express meta and imports. You
| _could_ write # Title : key =
| value ```include
| ./examples/hello.rs ```
|
| today and write a simple filter to extract meta from the
| first definition list and resolve includes.
| thekaleb wrote:
| Pandoc is written in Haskell.
| binarycoffee wrote:
| As others wrote, Pandoc is Haskell so it compiles to a fairly
| efficient binary.
|
| But more importantly, unlike the various Markdown flavors or
| AsciiDoc, it is incredibly extensible thanks to the combination
| of custom filters and the possibility to add HTML classes and
| attributes. One can write filters to leverage the
| class/attribute information and perform transformations at the
| AST level, which basically lets you define a DSL with an
| arbitrary number of custom elements.
|
| I wrote a collection of filters for the publication of a large
| online legal playbook. Not only did Pandoc make it possible to
| introduce different kind of custom elements that don't exist in
| plain Markdown or AsciiDoc, but by using different filters it
| was possible to use a single Markdown source to generate both
| the book and various summaries such as a list of examples, a
| list of civil code clauses etc. I don't know Haskell that well
| so I used Rust for the filters, but that worked very well.
|
| Pandoc is IMO a very underrated tool.
| johnday wrote:
| > Pandoc is Haskell so it compiles to a fairly efficient
| binary.
|
| This is nebulous. Haskell's compiled binaries are not ideal,
| for a number of reasons.[^1] GHC does very little to optimise
| for many typical metrics of "efficient". The binaries it
| produces are enormous because it (unavoidably) bundles the
| runtime along with the program itself, and there is a _lot_
| of empty space in the binaries. Shrinking them can improve
| startup times significantly especially on spinning rust
| drives.
|
| That said, Haskell programs are at least _compiled_, and they
| do result in binaries which, if well written, can result in
| running times comparable to (or, sometimes, shorter than)
| your average hand-rolled C code that achieves the same goals.
|
| Of course, none of this casts any shadow on the fact that
| Pandoc is, indeed, an excellently engineered piece of
| software that stands as a testament to the value of Haskell
| for real-world business logic and problem solving.
|
| [^1]: This problem is fairly well-understood in the Haskell
| community: https://dixonary.co.uk/small
| chungy wrote:
| AsciiDoc is written in Python.
|
| Though when it comes to annoyance with Markdown forks:
| AsciiDoctor is basically that to AsciiDoc. It's mostly
| compatible, but when it isn't, it really bites.
| toastal wrote:
| The more important part is that you end up with a binary
| instead of needing an interpreted language which makes the
| tooling a mess. Python and Ruby are the same thing to most
| people.
| gyulai wrote:
| I've been looking for a tech stack to replace latex for decades
| now. As a very recent development, the combination of
| pandoc+weasyprint (plus a little bit of homebrewed pandoc filter
| magic) has now become good enough for my needs, and I have
| finally been able to take the plunge. Feels great.
|
| For those who are a little less adventurous and who happen to be
| in the social sciences, humanities, journalism, etc.,
| pandoc+msword is also definitely worth looking into. It's a much
| better tech stack than standalone msword. -- It's really only in
| the STEM fields that, in my mind, there really is no way around
| latex.
| tikhonj wrote:
| No way around LaTeX, but Pandoc + LaTeX is still a much nicer
| experience than plain LaTeX in my experience.
| Brendinooo wrote:
| I'm in a job where I pretty much never need to output a PDF, but
| whenever the occasional thing comes around, pandoc is always
| there for me. Such a useful tool.
| CJefferson wrote:
| I love pandoc. With it's lua filters, I love using it for
| generating html and blog posts, one thing which always annoys me
| about most static website generating tools is they make you use
| some very limited templating language, when I just want to use a
| proper programming language.
|
| My only irritation -- while I understand why one would want to do
| it for neatness, it's annoying that the "pandoc" package no
| longer provides the "pandoc" program! Maybe instead introducing
| "pandoc-core" and renaming "pandoc-cli" to "pandoc" would be
| better (it would certainly avoid breaking existing scripts, like
| mine).
| sramsay wrote:
| This (I also generate my blog using pandoc). In another case, I
| wanted to go from Markdown to groff -mom and it was a totally
| straightforward matter with a custom Writer in Lua.
| y04nn wrote:
| If I understand correctly the package pandoc-cli installs the
| pandoc binary [1], so it shouldn't beak that many scripts.
|
| [1]:
| https://github.com/jgm/pandoc/blob/535bd0393fe7b2f287903b942...
| tetris11 wrote:
| fantastic software, never build it from source, or if you have
| to, make sure you have an OS that bundles all the Haskell
| dependencies into a single meta package
| jesprenj wrote:
| It or one of its libraries did not compile for me OOTB in
| Gentoo. Granted it's marked as nonstable (~*). That's
| unfortunate, because I really liked to use it when I was mainly
| running Debian. Though I didn't really put any debugging effort
| into making it work.
| mrspuratic wrote:
| I did this years back, it took, ehm, quiet some time. I use a
| binary now :D ISTR it starts with building a self-hosting
| Haskell compiler...
| Tyr42 wrote:
| Yeah, I ended up upgrading my machine's ram when I was building
| it from source. It's sizable.
| gregwebs wrote:
| Haskell tooling went from awful to best in class after stack
| came out. Look for "Quick stack method" in the installing page-
| It should be easy to build from source now with just a few
| commands. No doubt will take a long time to compile all the
| packages and you might still have issues tracking down any non-
| Haskell dependencies (c libraries).
| josephcsible wrote:
| Pandoc is really easy to build with just Haskell and Cabal
| installed via ghcup, without needing any Haskell packages from
| your distro.
| remoquete wrote:
| One new feature that will make Python documentarians happy is the
| `---list-tables` flag for rST output: You can now convert any
| table to the list table syntax of reStructuredText, which is, in
| many's opinion, superior to classic tables with ASCII borders.
| account-5 wrote:
| Love this program. Means I can write in plain text, markdown or
| zim-wiki syntax, and export to word no hassles.
|
| If I'm writing markdown I use pandocs version as it has support
| for advanced tables.
|
| Brilliant software.
| mrehler wrote:
| Out of every tool I've ever used to make a .docx file from
| Markdown, Pandoc is the only one that has consistent results with
| converting Markdown headers to Word styles rather than just a
| bigger font size. Lots of Markdown tools in my tool belt, and
| would love to know of any more that can do this, because it's
| really useful on the (unfortunate) occasions something needs to
| live as a Word doc.
| Veen wrote:
| I do wish there was an easy way to create Word document titles
| from H1s in Markdown. It makes sense that they should be
| converted to top-level headings, but it adds an annoying bit of
| friction to my workflow.
| fiddlosopher wrote:
| Oh, but there is! pandoc --shift-heading-
| level-by=-1 input.md -o output.docx
|
| This will promote level-2 headings to level-1, and promote a
| level-1 heading at the top of the document to the document's
| title.
| Veen wrote:
| Oh really? I've tried --shift-heading in the past and it
| worked to move headings up a level, but not to the title.
| I'll have to read the docs more carefully and give it
| another go. Thank you.
| maweki wrote:
| Pandoc is a great piece of software. As a university teacher and
| researcher, I use it in three ways:
|
| 1. I write markdown for my website and for the websites for my
| research projects and simply generate standalone html out of it.
| Done.
|
| 2. When we create electronic exams, the exam platform takes
| questions using a html-backed rich text editor. We write down our
| exam questions using markdown, create html document fragments,
| that we simply paste into the exam platform.
|
| 3. When students do electronic exams, we receive xml files from
| our exam platform. We use python to pass on submissions to
| different submission checkers (akin to autograders or static
| analysis) and create yaml files with the student submission and
| grading suggestions and static analysis annotations. We manually
| review and grade and comment within the yaml file (that works
| incredibly well), collect all the data using python and generate
| markdown reports for each student, including their submission,
| our comments and scoring. We pass this markdown through pandoc,
| creating well layouted pdfs which we either print and hand out or
| send out electronically.
|
| Pandoc fits our yaml+markdown-based processes very well. Only for
| the actual research papers we still write LaTeX and build pdfs
| without pandoc.
| phlummox wrote:
| Interesting! I use a very similar process for creating exams
| and student projects, but am the only one in my department who
| does so. Are any of your processes/tools publicly available?
| (Mine are basically cobbled together in Haskell and Python.)
| maweki wrote:
| Sorry, same. There's such a myriad of e-learning platforms in
| Germany and I guess it's the same for most countries.
|
| I would believe the same goes for our own research static
| analysis and autotrading platform (in our case SQL) which
| probably every CS department also has quite a few of.
|
| I wouldn't put my hopes up for anything publicly available
| that fits your platform and has a bus factor higher than 1.
| WolfOliver wrote:
| for the research papers which you write in LaTeX you should
| have a look at MonsterWriter.
|
| Disclaimer: I'm the creator of MonsterWriter and very keen to
| receive feedback and learn about how universities and their
| students write papers, thesis, ...
| Terretta wrote:
| It's cool you're using SetApp, thank you.
|
| For HN reading along, SetApp is a way to distribute apps and
| get paid _outside the app store_. Really, that exists.
|
| // Disclosure: Unless you are disavowing your ability as
| author to offer a recommendation that can be trusted, you
| probably mean "Disclosure" not "Disclaimer". Disclosure =
| here is my potential bias. Disclaimer = YMMV, no warranties
| express or implied.
| WolfOliver wrote:
| You can also download the app without SetApp. No
| Subscription needed!
| CJefferson wrote:
| Just tried opening it. It's looks nice, but I'm going to
| write some quick, slightly negative, comments, based on your
| claims about using it.
|
| The table formatting is not good enough. It's not obvious how
| to left-justify a column. It's also not clear how to line a
| column up along "." (which I often use for numbers). Both of
| these are fairly easy in LaTeX.
|
| The outputted LaTeX looks OK, but it's not obvious how to
| format -- most journals, and Universities (for PhDs) will
| have a fixed style you have to use. I suppose I could take
| the LaTeX and randomly hack it, but then I need to learn
| LaTeX to fix any issues that causes.
| WolfOliver wrote:
| both fair points, tables need still improvement.
|
| Regarding the outputted LaTeX, the idea is to grow the
| amount of supported templates. So there would be templates
| for every important journal. For now the focus is to make
| the thesis template flexible enough that it works for most
| bachelor/master thesis.
| maweki wrote:
| Though I don't really like you advertising, thank you for the
| suggestion. As a computer science researcher I'll give you
| some feedback why your application is a total deal-breaker
| for me and my colleagues:
|
| * It's not running on Linux. Nobody in our department runs
| windows or mac.
|
| * We already have huge BibTex citation libraries that we use
| in papers and just reference the necessary papers. These
| citation library files grow and grow. I won't manually add
| citations for each paper.
|
| * We collaborate and version through git. If collaborative
| writing and version control does not work at least as easy as
| our plaintext-git-handling, that's a hard no.
|
| * You do know that for conference or journal submission word
| and LaTeX templates with given page limits in these templates
| are given, right? How would I use, say, LNCS in
| MonsterWriter? Writing seems not to be page-based. How do I
| know that I'm over the limit?
|
| * My wife is a researcher in the social sciences, and they
| extensively use MS Word's change tracking and merging feature
| to write papers. If MonsterWriter does not support this in an
| accessible and visually appealing manner, it would be a hard
| no for her as well.
|
| With your feature set, you're not really targeting
| researchers, even if you think you do.
| WolfOliver wrote:
| Thank you for your time to answer, very much appreciated :)
| tambourine_man wrote:
| It seems you're being downvoted for proposing a tool you
| created and disclaimed as such. I think that's perfectly fine
| and in the spirit of HN.
| adityaathalye wrote:
| Pandoc powers my little static site maker:
|
| cf.
| https://github.com/adityaathalye/shite/blob/master/bin/templ...
| __shite_templating_compile_source_to_html() { # If
| content has front matter metadata, it is presumed to be in a
| format # that the content compiler can safely process
| and elide or ignore. local file_type=${1:?"Fail. We
| expect file type of content like html, org, md etc."}
| case ${file_type} in html )
| pandoc -f html -t html ;; md )
| pandoc -f markdown -t html ;; org
| ) pandoc -f org -t html ;;
| esac }
| BeetleB wrote:
| Mine too. I author in org and have a plugin that converts the
| org files to rst files.
| adityaathalye wrote:
| Ah, I make it compile partial HTML (just page content), and
| stick that into my own HTML page template(s) (written as
| HEREDOCS :D).
|
| That's part of the joy of using Pandoc. I can pipeline it, no
| problem.
|
| Like this:
|
| cf. https://github.com/adityaathalye/shite/blob/master/bin/te
| mpl... cat "${watch_dir}/sources/${url_slug}"
| | __shite_templating_compile_source_to_html
| ${file_type} | __shite_templating_wrap_content_html
| ${content_type} ${watch_dir} |
| __shite_templating_wrap_page_html \ >
| "${watch_dir}/public/${html_url_slug}"
|
| Templates look like this. Notice the $(cat -) in the middle.
| That's how the HTML content produced by Pandoc gets injected
| in the middle of everything else.
| shite_template_common_default_page() { local maybe_
| page_id=${shite_page_data[page_id]:+"id=\"${shite_page_data[p
| age_id]}\""} local
| maybe_canonical_url=${shite_page_data[canonical_url]:+"<link
| rel=\"canonical\"
| href=\"${shite_page_data[canonical_url]}\">"}
| cat <<EOF <!DOCTYPE html> <html lang="en">
| <head> $(shite_template_common_meta)
| $(shite_template_common_links)
| ${maybe_canonical_url} </head> <body
| ${maybe_page_id}> <div id="the-very-top"
| class="stack center box">
| $(shite_template_common_header) <main
| id="main"> $(cat -) </main>
| $(shite_template_common_footer) </div>
| </body> </html> EOF }
|
| edit: substantiate Pandoc's role.
| BeetleB wrote:
| In my case I'm using Pelican which has builtin support for
| rst, so it was easier for me to just convert to rst rather
| than full blown HTML.
| cosmic_quanta wrote:
| That's great news. I've been waiting for years for a dedicated
| 'Figure' element. The workaround was pretty brittle. It'll make
| pandoc-plot [0] easier to maintain as well.
|
| [0]: https://github.com/LaurentRDC/pandoc-plot
| quijoteuniv wrote:
| Do you want to change yve world? Write software like this. Pandoc
| is great.
| jawadch93 wrote:
| [dead]
| gpoore wrote:
| I've been frustrated by Markdown previews not supporting Pandoc
| features, so I created a Pandoc-based Markdown preview for VS
| Code [1]. The preview supports all Pandoc extensions to Markdown
| syntax, because Pandoc itself generates the preview. There is
| also optional support for code execution with Jupyter kernels.
| I'm currently in the process of adding support for non-Markdown
| formats (including scroll sync), plus taking advantage of some of
| the new Pandoc 3.0 features.
|
| [1]: Examples and animations:
| https://codebraid.org/presentations/scipy2022/. Installation for
| VS Code:
| https://marketplace.visualstudio.com/items?itemName=gpoore.c....
| Installation for VSCodium: https://open-
| vsx.org/extension/gpoore/codebraid-preview.
| dosshell wrote:
| Does that mean that you are adding support for latex etc?
|
| this is awesome, thank you for your work.
| gpoore wrote:
| Yes, I'm adding support for arbitrary text-based formats,
| including LaTeX. So it will be possible to write LaTeX and
| see a live HMTL preview generated by Pandoc.
|
| In principle, it should be possible to create a PDF preview
| with proper SyncTeX support for synchronizing LaTeX source
| and PDF preview locations, but that gets complicated when
| Pandoc+LaTeX generate the PDF. It may be best to leave LaTeX-
| PDF previews to dedicated LaTeX previewers that don't involve
| Pandoc.
| noiwillnot wrote:
| > Yes, I'm adding support for arbitrary text-based formats
|
| That would be extremely powerful, and also would allow you
| to differentiate your extension from the Quarto one.
| gpoore wrote:
| I actually released my extension around the same time
| that the Quarto extension came out. Quarto is great for
| documents running R code or needing some of Quarto's
| advanced document features. My extension has scroll sync
| and the preview updates live while you type. If you need
| code execution, you can use multiple Jupyter kernels per
| document and execute inline code. Also, code execution is
| non-blocking, so the preview still updates when you type,
| and code output appears live as it becomes available.
| leipert wrote:
| Funny. During my bachelor thesis I added Pandoc as a renderer
| to an Atom markdown preview extension. (instead of actually
| writing my thesis)
|
| https://github.com/atom-community/markdown-preview-plus/pull...
|
| Old is new, the editor and the extension are now defunct. What
| was best about this exercise, I got so well versed with the
| markdown and Pandoc features at the time, that I didn't need
| the preview at all.
___________________________________________________________________
(page generated 2023-01-19 23:01 UTC)