[HN Gopher] Libxo: Easy way to generate text, XML, JSON, and HTML
___________________________________________________________________
Libxo: Easy way to generate text, XML, JSON, and HTML
Author : edward
Score : 60 points
Date : 2023-07-14 18:29 UTC (4 hours ago)
(HTM) web link (juniper.github.io)
(TXT) w3m dump (juniper.github.io)
| sam_bristow wrote:
| I would love having structured output from shell commands, but
| for now I'd settle for people using stdout and stderr correctly.
| moody__ wrote:
| I see people say they would like this quite often. There is a
| command line interface that does do this, powershell is very
| structured in this exact way. But no one is throwing away their
| sh/bash/tcsh/ksh for powershell. I think this is just a classic
| example of the grass being more green on the other side.
| JNRowe wrote:
| Every time I see this I'm hoping that it has taken off, because
| it feels like such an obvious improvement. Instead of that we
| have some support for it in FreeBSD, and a homegrown solution in
| some packages like util-linux. Yeah, there are some concerns, but
| the concept seems sound and the implementation can be iterated
| upon.
|
| For instance, years later and it still isn't packaged in Debian.
| If nothing out of the tens of thousands of Debian packages has a
| dependency on it there presumably must be a good reason.
|
| It strikes me as one of those libtermkey1/libvterm things where
| Leonerd pushed it for years before anybody really used it,
| despite it being a seemingly obvious improvement over the status
| quo.
|
| 1 https://www.leonerd.org.uk/code/libtermkey/
| ComputerGuru wrote:
| > Instead of that we have some support for it in FreeBSD
|
| My google fu is failing me right now but FreeBSD also has a
| shared library used for reading/parsing config files and
| providing either a common or universal dsl for all conf files
| using the same library. This is one of the benefits of using an
| OS instead of a distribution - all the tools are developed
| holistically and refactors such as providing a shared,
| universal input or output format, sandboxing everything with
| capsicum, etc across the board are much more possible.
|
| EDIT
|
| Remembered it. Surprised at how bad Google was at finding this,
| though!
|
| UCL - Universal Configuration Language [0]. Introduced in a
| paper by Allan Jude in 2015 [1]. Man page: libucl(3) [2].
|
| [0]: https://github.com/vstakhov/libucl/
|
| [1]: https://papers.freebsd.org/2015/bsdcan/allanjude-ucl/
|
| [2]:
| https://man.freebsd.org/cgi/man.cgi?query=libucl&sektion=3&f...
| kristopolous wrote:
| I bet you could do some pretty clever heuristic hacks to wrap a
| bunch of programs in this especially if you attach to the process
| and say, clobber printf. I'm thinking in the spirit of rlwrap.
|
| It's certainly more of a game genie approach but it might
| occasionally be awesome.
| zokier wrote:
| Seeing that this originates from FreeBSD ecosystem, did it get
| actually adopted widely in FreeBSD base system? At least I
| interpreted that to have been the goal:
| https://juniper.github.io/libxo/libxo-manual.html#can-you-sh...
| tedunangst wrote:
| It's used inconsistently. df uses it, but not du. ps, but not
| ls.
| Norfair wrote:
| In Haskell we use autodocodec for this.
| 38 wrote:
| you can already do this in other languages. For example, here is
| Go: package main import (
| "encoding/json" "encoding/xml" "os"
| ) type wc struct { File []file
| } type file struct { Lines int
| Words int Characters int Filename
| string } func main() {
| etc_motd := wc{ []file{ {25, 1165,
| 1140, "/etc/motd"}, }, }
| json.NewEncoder(os.Stdout).Encode(etc_motd)
| xml.NewEncoder(os.Stdout).Encode(etc_motd) }
| paulddraper wrote:
| Gee thanks mister!
| 38 wrote:
| get outta here kid.
| ComputerGuru wrote:
| Serde can _kind of_ do this for rust projects, but you 're
| usually constrained to outputs that are "identical but for the
| syntax/format" (i.e. same field names though perhaps with
| different naming conventions).
|
| I've used that to convert configuration files from one language
| to the other, such as this json2toml and toml2json tool [0].
|
| [0]: https://github.com/neosmart/toml2json
| mananaysiempre wrote:
| > Serde can kind of do this for rust projects, but you're
| usually constrained to outputs that are "identical but for the
| syntax/format" (i.e. same field names though perhaps with
| different naming conventions).
|
| Libxo's distinguishing feature (IMO) is that its schema
| specifications are plaintext output with markup specifying
| which parts are data to be extracted into structured formats,
| so that you can port your usual Unix tool to it and not ruin
| its original ad-hoc output. I don't know of anything positioned
| as a serialization library that can do this with comparable
| grace, Serde included.
|
| Related: the section on marking up plaintext output in the Ivo
| essay[1].
|
| [1]
| https://web.archive.org/web/20111204021526/http://lubutu.com...
| (discussed at the time at
| https://news.ycombinator.com/item?id=3300264)
| ComputerGuru wrote:
| Yup. libxo is much more free-form, while anything going
| through a serialize-deserialize process is going to
| necessarily have to be more regular.
| ary wrote:
| This, at least in concept, looks like a potential successor to
| printf() et al. The general accessibility of it is lacking given
| that it's a C-only API at the moment (there don't appear to be
| bindings for other languages), and I'm left questioning whether
| format strings are the best way. Perhaps worse is better in this
| case.
|
| When thinking about this problem I've not been able to get beyond
| the decision of "should it be done with something like a builder
| pattern and a graph of objects/structures" or "should it be done
| with a DSL" (which is what I consider the format strings approach
| to be). A DSL is more immediately convenient when creating
| output, but when you want to understand the structure you're
| emitting it seems better to have code that is explicit and
| imperative.
| loeg wrote:
| libxo is in practice a poor approach to generating structured
| output from unix utilities. There are at least a few problems.
| The format strings do not easily replace existing formatted
| prints, so it is not straightforward to adopt. For anything
| more complicated than simple row records, you have to change
| the structure of your program significantly and might as well
| just use a different path for formatting structured output. It
| is unaware of locales, and as a result, butchers text in non
| ASCII/UTF-8 encodings. Finally, a separate-binary-with-
| structured-text-output is a poor library interface to quite a
| lot of these utilities -- a callable C API would be more
| broadly useful.
| yyyk wrote:
| Libxo is used in FreeBSD. I can't say I'm a fan of the approach
| though.
|
| Typical printf usage is imperative and additive:
|
| if (enter) printf("Hello "); else printf("Goodbye ");
| printf("World!\n");
|
| Using the format string forces the programmer to keep implicit
| state (the document format) all over the place or get an
| inconsistent document. For example, imagine the first printf
| calls the column 'Text' and the others call it 'Output'. We can
| easily do this for a single format, but the complexity will get
| higher the more we add.
|
| If you do this properly (emit to an object and render from that),
| the result is trivially consistent. The difficulty here is to get
| streaming, this is however not always required and can be
| achieved with a little effort.
| lelanthran wrote:
| > Using the format string forces the programmer to keep
| implicit state (the document format) all over the place or get
| an inconsistent document.
|
| I'm not understanding your objection[1]; surely you would only
| define the libxo format string _once_ , and then reuse it
| everywhere? Without libxo you'd need to duplicate your code
| everywhere for every output format you want to support.
|
| IOW, you'd have to construct your libxo format string using a
| string concatenation library, something like this:
| const char *s1 = "{:Text%7ju}"; const char *s2 =
| "{:Output%7ju}"; char *final = NULL; if
| (enter) { final = strdup (s1); } else {
| final = strdup (s2); } final = strconcat (final,
| s2);
|
| Isn't that a better mechanism than printf?
|
| [1] It's late, I've the flu and feeling a little stupid right
| now. Also, this is the first I've seen this project.
| yyyk wrote:
| >surely you would only define the libxo format string once,
| and then reuse it everywhere
|
| I would be rather scared of using a variable for a format
| string. IMHO, these types of format strings are an
| antipattern for a different reason - we don't see the format
| at point of use, if we make some too easy mistakes, we have a
| crash or CVE*. I think nowaways there are tools to do some
| verification**, and I guess we could use an IDE (but most C
| programmers don't?), but I am unfamiliar with any such tool
| which supports libxo style format strings.
|
| * https://en.wikipedia.org/wiki/Format_string_attack
|
| ** IIRC, GCC/Clang eventually added a verifier? But it
| doesn't apply to all cases or scanf? I don't recall.
| mpweiher wrote:
| > The difficulty here is to get streaming
|
| Polymorphic Write Streams do this.
|
| ACM DL: https://dl.acm.org/doi/10.1145/3359619.3359748
|
| pdf:
| http://www.hirschfeld.org/writings/media/WeiherHirschfeld_20...
|
| Code:
| https://github.com/mpw/MPWFoundation/tree/master/Streams.sub...
|
| Fast JSON parsing using this approach:
| https://blog.metaobject.com/2020/04/somewhat-less-lethargic-...
|
| Presentation (DLS '19):
| https://www.youtube.com/watch?v=DG5MtsMojgI
___________________________________________________________________
(page generated 2023-07-14 23:00 UTC)