[HN Gopher] Show HN: Jq-Like Tool for Markdown
___________________________________________________________________
Show HN: Jq-Like Tool for Markdown
There have been a few times I wanted the ability to select some
text out of a Markdown doc. For example, a GitHub CI check to
ensure that PRs / issues / etc are properly formatted. This can be
done to some extent with regex, but those expressions are brittle
and hard to read or edit later. mdq uses a familiar pipe syntax to
navigate the Markdown in a structured way. It's in 0.x because I
don't want to fully commit to the syntax being stable, in case
real-world testing shows that the syntax needs tweaking. But I
think the project is in a pretty good spot overall, and would be
interested in feedback!
Author : yshavit
Score : 89 points
Date : 2025-02-23 20:05 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| lanstin wrote:
| Ironically one of the reasons markdown (and other text based file
| formats) were popular because you could use regular find/grep to
| analyze it, and version control to manage it.
| cdbattags wrote:
| Definitely, but it's neat nonetheless because more and more
| things are "structured Markdown" these days. Extremely useful
| for AI reasoning and outputs.
| monsieurbanana wrote:
| > because you could use regular find/grep to analyze it
|
| They were meant to be analyzable in some ways. Count lines,
| extract headers, maybe sed-replace some words. But being able
| to operate/analyze over multiline strings was never a strong
| point of unix tools.
| zahlman wrote:
| I don't think anyone ever really expected to see widespread use
| of regexes to alter the _structure_ of a Markdown document.
| Honestly, while something like "look for numbers and surround
| them with double-asterisks to put them in boldface" is feasible
| enough (and might even work!), I can't imagine that a lot of
| people would do that sort of thing very often (or want to)
| anyway.
|
| If a document is supposed to have structure - even something as
| simple as nested lists of paragraphs - it doesn't seem
| realistic to expect regular text manipulation tools to do a
| whole lot with them. Something like "remove the second
| paragraph of the third entry in the fourth bullet-point list"
| is well beyond any sane use of any regex dialect that might be
| powerful enough. (Keeping in mind that traditional regexes
| can't balance brackets; presumably they can't properly track
| indentation levels either.)
|
| See also: TOML - generally quite human-editable, but still very
| much structured with potentially arbitrary nesting.
| twinkjock wrote:
| Thanks for sharing this Yuval! Thanks as well for using
| permissive licenses so I can use this at work.
| imglorp wrote:
| Curious, which license can't you use at work for a simple shell
| tool? Considering you're not linking against it, even GPL3
| should be okay, right?
| unglaublich wrote:
| My flow is to go through the Pandoc JSON AST and then use Jq.
| This works for other input formats, too.
| yshavit wrote:
| I'm curious how ergonomic you find that? I did look at the
| pandoc JSON initially, and found it fairly awkward to work
| with. It's a great interchange format, but doesn't seem
| optimized for either human interaction or scripting. (It's
| definitely possible to use it for scripting, it just felt
| cumbersome to me, personally.)
| saghm wrote:
| I've never had a need for parsing markdown like this, bit I
| have to wonder, would it make to go through HTML instead, given
| that it's what markdown is designed to compile to? At that
| point, I'd assume there's any number of existing XML tools that
| work work, and my (maybe naive) assumption is that typical
| markdown documents would be relatively flat compared to how
| deeply nested "native" HTML/XML often gets, so it doesn't seem
| like most queries would require particularly complex XPath to
| be able to specify.
| nodesocket wrote:
| How is it parsing? Just normal string and regex matching or
| transforming markdown to an intermediate structured language?
| yshavit wrote:
| For the markdown, I'm using https://github.com/wooorm/markdown-
| rs, which is a formal parser that produces an AST. For the
| query language, I have a very simple hand-rolled parser.
| broodbucket wrote:
| I think you'd benefit of having some more real-world-ish examples
| in the README, as someone who doesn't intuit what I'd want to use
| this for.
| verdverm wrote:
| > GitHub PRs are Markdown documents, and some organizations have
| specific templates with checklists for all reviewers to complete.
| Enforcing these often requires ugly regexes that are a pain to
| write and worse to debug
|
| This is because GitHub is not building the features we need,
| instead they are putting their energy towards the AI land grab.
| Bitbucket, by contrast, has a feature where you can block PRs
| using a checkbox list outside of the description box. There are
| better ways to solve this first example from OP readme. Cool
| project, I write mainly MDX these days, would be cool to see
| support for that dialect
| yshavit wrote:
| The Markdown parsing library I'm using supports MDX, so it
| shouldn't be too difficult to come up with syntax for those
| components. I haven't done that yet, but mostly because I
| didn't want to go down that path until I knew there was
| interest and had a concrete use case or two to inform the query
| syntax.
|
| If you want to open an enhancement request issue, I'm happy to
| take a look (PRs also welcome, but not required). If you're not
| on GitHub, let me know and we can figure out some other way to
| get the request tracked.
|
| Thanks for taking a look at the project!
| verdverm wrote:
| I don't write rust and already have an MDX toolbox that fits
| my needs. Browser, GH, and IDE search / TOC are good enough
| for me.
|
| I'm currently in a phase of trying to shed tools and added
| complexity, rather than add them
| yshavit wrote:
| Fair enough!
| codelion wrote:
| it's a shame when core feature development seems to lag. i've
| also been working w/ MDX lately & agree that support would be a
| great addition.
___________________________________________________________________
(page generated 2025-02-23 23:00 UTC)