[HN Gopher] Show HN: Jq-Like Tool for Markdown
       ___________________________________________________________________
        
       Show HN: Jq-Like Tool for Markdown
        
       There have been a few times I wanted the ability to select some
       text out of a Markdown doc. For example, a GitHub CI check to
       ensure that PRs / issues / etc are properly formatted.  This can be
       done to some extent with regex, but those expressions are brittle
       and hard to read or edit later. mdq uses a familiar pipe syntax to
       navigate the Markdown in a structured way.  It's in 0.x because I
       don't want to fully commit to the syntax being stable, in case
       real-world testing shows that the syntax needs tweaking. But I
       think the project is in a pretty good spot overall, and would be
       interested in feedback!
        
       Author : yshavit
       Score  : 89 points
       Date   : 2025-02-23 20:05 UTC (2 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | lanstin wrote:
       | Ironically one of the reasons markdown (and other text based file
       | formats) were popular because you could use regular find/grep to
       | analyze it, and version control to manage it.
        
         | cdbattags wrote:
         | Definitely, but it's neat nonetheless because more and more
         | things are "structured Markdown" these days. Extremely useful
         | for AI reasoning and outputs.
        
         | monsieurbanana wrote:
         | > because you could use regular find/grep to analyze it
         | 
         | They were meant to be analyzable in some ways. Count lines,
         | extract headers, maybe sed-replace some words. But being able
         | to operate/analyze over multiline strings was never a strong
         | point of unix tools.
        
         | zahlman wrote:
         | I don't think anyone ever really expected to see widespread use
         | of regexes to alter the _structure_ of a Markdown document.
         | Honestly, while something like  "look for numbers and surround
         | them with double-asterisks to put them in boldface" is feasible
         | enough (and might even work!), I can't imagine that a lot of
         | people would do that sort of thing very often (or want to)
         | anyway.
         | 
         | If a document is supposed to have structure - even something as
         | simple as nested lists of paragraphs - it doesn't seem
         | realistic to expect regular text manipulation tools to do a
         | whole lot with them. Something like "remove the second
         | paragraph of the third entry in the fourth bullet-point list"
         | is well beyond any sane use of any regex dialect that might be
         | powerful enough. (Keeping in mind that traditional regexes
         | can't balance brackets; presumably they can't properly track
         | indentation levels either.)
         | 
         | See also: TOML - generally quite human-editable, but still very
         | much structured with potentially arbitrary nesting.
        
       | twinkjock wrote:
       | Thanks for sharing this Yuval! Thanks as well for using
       | permissive licenses so I can use this at work.
        
         | imglorp wrote:
         | Curious, which license can't you use at work for a simple shell
         | tool? Considering you're not linking against it, even GPL3
         | should be okay, right?
        
       | unglaublich wrote:
       | My flow is to go through the Pandoc JSON AST and then use Jq.
       | This works for other input formats, too.
        
         | yshavit wrote:
         | I'm curious how ergonomic you find that? I did look at the
         | pandoc JSON initially, and found it fairly awkward to work
         | with. It's a great interchange format, but doesn't seem
         | optimized for either human interaction or scripting. (It's
         | definitely possible to use it for scripting, it just felt
         | cumbersome to me, personally.)
        
         | saghm wrote:
         | I've never had a need for parsing markdown like this, bit I
         | have to wonder, would it make to go through HTML instead, given
         | that it's what markdown is designed to compile to? At that
         | point, I'd assume there's any number of existing XML tools that
         | work work, and my (maybe naive) assumption is that typical
         | markdown documents would be relatively flat compared to how
         | deeply nested "native" HTML/XML often gets, so it doesn't seem
         | like most queries would require particularly complex XPath to
         | be able to specify.
        
       | nodesocket wrote:
       | How is it parsing? Just normal string and regex matching or
       | transforming markdown to an intermediate structured language?
        
         | yshavit wrote:
         | For the markdown, I'm using https://github.com/wooorm/markdown-
         | rs, which is a formal parser that produces an AST. For the
         | query language, I have a very simple hand-rolled parser.
        
       | broodbucket wrote:
       | I think you'd benefit of having some more real-world-ish examples
       | in the README, as someone who doesn't intuit what I'd want to use
       | this for.
        
       | verdverm wrote:
       | > GitHub PRs are Markdown documents, and some organizations have
       | specific templates with checklists for all reviewers to complete.
       | Enforcing these often requires ugly regexes that are a pain to
       | write and worse to debug
       | 
       | This is because GitHub is not building the features we need,
       | instead they are putting their energy towards the AI land grab.
       | Bitbucket, by contrast, has a feature where you can block PRs
       | using a checkbox list outside of the description box. There are
       | better ways to solve this first example from OP readme. Cool
       | project, I write mainly MDX these days, would be cool to see
       | support for that dialect
        
         | yshavit wrote:
         | The Markdown parsing library I'm using supports MDX, so it
         | shouldn't be too difficult to come up with syntax for those
         | components. I haven't done that yet, but mostly because I
         | didn't want to go down that path until I knew there was
         | interest and had a concrete use case or two to inform the query
         | syntax.
         | 
         | If you want to open an enhancement request issue, I'm happy to
         | take a look (PRs also welcome, but not required). If you're not
         | on GitHub, let me know and we can figure out some other way to
         | get the request tracked.
         | 
         | Thanks for taking a look at the project!
        
           | verdverm wrote:
           | I don't write rust and already have an MDX toolbox that fits
           | my needs. Browser, GH, and IDE search / TOC are good enough
           | for me.
           | 
           | I'm currently in a phase of trying to shed tools and added
           | complexity, rather than add them
        
             | yshavit wrote:
             | Fair enough!
        
         | codelion wrote:
         | it's a shame when core feature development seems to lag. i've
         | also been working w/ MDX lately & agree that support would be a
         | great addition.
        
       ___________________________________________________________________
       (page generated 2025-02-23 23:00 UTC)