[HN Gopher] Tar.pl - A tar creator and extractor in approx. 100 ...
___________________________________________________________________
Tar.pl - A tar creator and extractor in approx. 100 lines of Prolog
Author : superdisk
Score : 203 points
Date : 2023-01-18 14:38 UTC (8 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| 082349872349872 wrote:
| DCGs are an awesome way to specify a format but a poor way to
| implement it?
| mark_l_watson wrote:
| I agree and the code itself is awesome also.
|
| Around 30 years ago I was really into Prolog but recently I
| have tried getting back into it and I am having a rough go at
| it. I still have the old Art of Prolog book, so that might be a
| good reentry point.
| philjohn wrote:
| I still maintain that learning Prolog 25 years ago as part of
| my degree (AI + Comp Sci) made me a better programmer.
| mncharity wrote:
| > recently I have tried getting back into it and I am having
| a rough go at it
|
| Fwiw, doing problem setup and result processing in another
| language can avoid much pain. So prolog only where it excels,
| and a mainstream ecosystem for the rest. I've liked batch
| mode "write a prolog file, run it, read result (eg json, or
| wrapper-language code to eval)", but was recently thinking of
| trying PySwip[1].
|
| [1] https://github.com/yuce/pyswip
| nmadden wrote:
| That's been my experience, and chimes with what the author
| writes at the end about performance. I remember trying to
| stream a few hundred MB of CSV data with Prolog a few years ago
| and giving up after not being able to either write or find a
| library that could cope with even that kind of size (which I
| wouldn't consider particularly big). (In the end, I managed to
| bulk import the data into SQLite, which took a few seconds,
| then used SQL instead).
|
| I should give Scryer Prolog a go to see if it is any better,
| and I'm by no means an expert Prolog programmer so YMMV.
| richard_todd wrote:
| It's all mind-bending fun until the interpreter starts
| blowing one of its internal stacks. Then it's hours of tedium
| tracing through queries looking for ways to contain the
| computation.
| rhelz wrote:
| Nicely done. Prolog is killer for database programming/querying
| and parsing/formatting. Since these are not exactly "niche"
| tasks, its kinda sad that prolog isn't more popular.
| dragonwriter wrote:
| One reason it is not more popular is that it _was_ , but then
| its Datalog subset, and supersets of Datalog that are different
| than Prolog, took a lot of that use from it. (Often with the
| Prolog syntax removed since it is frequently embedded in other
| languages, so it becomes a semantic rather than syntactic
| subset of Prolog.)
| cmrdporcupine wrote:
| There does at least seem to be a recent uptick in popularity
| and interest in Datalog, which expresses a more constrained
| form of similar concepts specialized on data management tasks.
| Not as general purpose as Prolog, but more suitable for a
| database environment.
|
| (Also here's a gratuitous plug for my employer's product in a
| similar vein: https://docs.relational.ai/rel/intro/overview)
| optimalsolver wrote:
| Wouldn't Datalog be a better alternative?
| anon291 wrote:
| Yes. Datalog is not Turing-complete; thus is much easier to
| reason about, and good query optimizers can be much more
| aggressive. Once you introduce cuts (as in prolog), you have
| to throw a lot of that out the window. But Datalog is a
| subset of prolog.
| rhelz wrote:
| Datalog is great for database programming, and since it is a
| restricted subset of prolog, a datalog comes in every box of
| prolog :-)
| Avshalom wrote:
| Sort of.
|
| Because of it's restrictions it traditionally uses
| different search/resolution strategies than Prolog. And in
| the other direction the SLD resolution strategy Prolog
| traditionally uses can fail on syntactically valid Datalog
| (clause ordering issues).
|
| That said, the big Prolog implementations all support
| multiple resolution strategies although I suspect there are
| a lot of various optimizations Datalog squeezes out of it's
| restrictions that let it handle things that might not be
| reasonable in a given Prolog.
| rhelz wrote:
| Good point; in the olden days of prolog only the SLD
| search strategy came out of the box, and you had to roll
| your own if you wanted others. However, happily a lot of
| the tabled programming research from the XSB prolog group
| (and others) has osmosed out into the mainstream prolog
| implementations by now.
| cmrdporcupine wrote:
| Aren't there significant differences in implementation
| between a Datalog system and Prolog running similar rules,
| though? Bottom-up vs top-down eval, is what I read...
| rhelz wrote:
| Classically, yes, but most prolog implementations these
| days support a tabled search strategy as Datalog
| typically does.
| gamache wrote:
| On one hand, Prolog is more powerful than Datalog. But on the
| other hand, Prolog is more powerful than Datalog.
| telekid wrote:
| With great power comes great irresponsibility.
| DonHopkins wrote:
| Just as C combines the power and performance of assembly
| language with the flexibility and ease-of-use of assembly
| language.
| maweki wrote:
| In this case, it's even worse. Prolog is computationally
| complete. With everything that comes with rice's theorem.
| Every non-trivial property of a program being generally
| undecidable.
|
| Datalog, on the other hand, isn't. It's terminating, like
| SQL. And many program properties are indeed decidable.
| tpoacher wrote:
| Query: morepowerfullthan(Prolog, Datalog).
|
| yes; yes; yes; yes; yes; yes; yes; yes; ...
| aeneasmackenzie wrote:
| I like these data description languages but most of them just
| have support for sequential structures and don't support offsets
| within the file. These are very common in file formats created
| for use with C or a similar language, which is most of them. It's
| not hard to add but it destroys the elegance of the formalism.
| TheRealPomax wrote:
| What's the point of a comment like this when Prolog supports
| file seeking just fine? That's like complaining about the lack
| of structs in procedural languages when the post you're
| commenting on is a C++ post...
| aeneasmackenzie wrote:
| You can file seek in any language, but not in most file
| format DSLs (or this one, because it's local). The author
| mentions QuickBMS and thus is aware of this, but most
| academic format languages don't support it.
| d33 wrote:
| Just curious, is .pl the standard extension for Prolog programs?
| Why would Perl and Prolog choose the same one, instead of one
| using e.g. .pg?
| mathstuf wrote:
| Fun trivia: I got a Emacs user to switch to Vim because in one
| of our class projects using Prolog, Emacs kept trying to
| highlight it as Perl (though it might be valid Perl syntax now
| that I think about it...). When I showed him that Vim did that
| for an empty file, but if it detected "Prolog-y" content (IIRC,
| the comment marker or a `:-` did it), it defaulted to Prolog
| highlighting.
|
| No idea if Emacs ever fixed it (this was 2009 or so).
| waynecochran wrote:
| This is a trivial fix for an emacs user. M-x prolog-mode.
| puffoflogic wrote:
| Instructions unclear, pinky finger fell off.
| davelee wrote:
| A less common extension is .pro.
| Tomte wrote:
| Probably because few people used both.
|
| .pro is also common for Prolog code, if you care about the
| extension clash.
| tmtvl wrote:
| .pro is also used by IDL and Qt. I imagine there are few two
| or three letter extensions that aren't overloaded.
| DonHopkins wrote:
| Because .log was taken.
| civopsec wrote:
| > instead of one using e.g. .pg?
|
| Why not just `.prolog`? In this day and age. (Maybe there's
| some reason like "then I can't put these source files on FAT
| filesystems", I don't know.)
|
| Java seems to do fine with it's longer-than-three-letters
| `.java`.
| Avshalom wrote:
| Most editors will understand .prolog, but on the other hand
| it's 15 years older than perl and it's not hard to teach
| editors to treat .pl as prolog.
| anon291 wrote:
| .pl is used because prolog is really old and FAT used to
| enforce extension limits. Not relevant nowadays, but it's
| stuck around.
| [deleted]
| dmd wrote:
| This makes me miss using Ab Initio, which in its language "DML"
| lets you describe (even extremely complex) record formats
| declaratively.
| kartic wrote:
| Makes me want to learn Prolog.
| umvi wrote:
| > It's unfortunately absolutely dog slow
|
| Seems like a common chicken-egg problem of niche languages.
| Performance doesn't improve until lots of people are using it,
| but lots of people won't use it until performance improves.
| tmtvl wrote:
| Eh, people will happily use slow languages if they are marketed
| the right way. Look at Python, it has a very large following,
| yet it's so slow that the dog's descendants will have evolved
| to adapt to a marine environment by the time Python finishes
| running Hello World.
| lliamander wrote:
| The bidirectional nature of predicates is one of the cool aspects
| of logic programming.
|
| It is also cool that is completely transparent whether a
| predicate returns a single value or multiple values. It's sort of
| like if every function were implicitly a generator.
| pmarreck wrote:
| Now do it in Bash. ;)
| olddustytrail wrote:
| Sure #!/bin/bash tar "$@"
|
| Takes the same arguments as the original!
| jonas21 wrote:
| I don't get it. tar is such a simple format you could do this in
| like 50 lines of C. Most of that would be declaring a struct. And
| unlike this implementation, it would be performant.
| mattkrause wrote:
| The cool thing is that itt uses the same code as the reader and
| writer. In essence, you've written a function: `tar(dstTarFile,
| srcFiles)` You provide _one_ of the two and query Prolog to
| find the other: you pass in a tar file as the first argument,
| and it tells you what `srcFiles` would need to be to create
| that archive, effectively extracting it. Alternately, you pass
| in `srcFiles` and it tells you what the corresponding
| dstTarFile should be, creating an archive.
|
| Another way to think about it is that the Prolog code isn't
| telling you how to read/write tar files. Instead, imagine it as
| a cost function that tells you how well this tar file
| corresponds to the uncompressed representation of those files,
| which you're minimizing.
|
| If you were to write it in C, the code is imperative: do this,
| then that, then this other thing. Consequently, the reader and
| writer code are totally different. For example, to process the
| file size, you would write functions two functions: `char*
| uIntToOctalASCII(unsigned int sz)` and `octalASCIIToUInt(char*
| szstr)`. Semantically, these are inverses:
| `octalASCIIToUInt(uIntToOctalASCII(x)) == x`[0] but the
| implementations do not really overlap.
|
| [0] And leaks memory, but whatever....
| dagss wrote:
| First time I see Prolog...I thougt that seeing the compiler
| invert a function automatically is quite impressing. I mean
| just being able to take only the forwards declaration and
| figure out the inverse is interesting in itself.
| marbu wrote:
| From the README: <quote>We're going to declaratively define a
| parser for the tar format, and once we're done, our code will not
| only extract .tar files, but create them as well.</quote>
|
| I recall reading about this somewhere, but haven't seen such nice
| real world example before. Well done.
| zelphirkalt wrote:
| I think if done right, this falls out of Prolog automatically,
| as you program in relations, rather than functions. Depending
| on which places you leave empty in a query, Prolog will work to
| fill in the empty places. One is probably the tar file and one
| is the folder being added to a tar archive, but none of the 2
| is any more important than the other.
| Avshalom wrote:
| There's a cool little bencode library that also works like
| that, all of 30 short lines too.
|
| https://github.com/mndrix/bencode/blob/master/prolog/bencode...
|
| https://en.wikipedia.org/wiki/Bencode
| moss2 wrote:
| Absolutely marvelous.
| derefr wrote:
| Sounds exotic at first glance, but how different is this from the
| Erlang implementation of tar parsing? I.e. how much of this is
| relying on Prolog-specific features, vs. how much is just
| succinct head-clause pattern matching?
| superdisk wrote:
| The unique part is that you get encoding and decoding for free
| when just describing the spec. It is essentially just succinct
| head-clause matching though.
| derefr wrote:
| An idiomatic Erlang/Elixir implementation of validation for a
| binary format _would_ also give you decoding for free -- see
| e.g. https://zohaib.me/binary-pattern-matching-in-elixir/. So
| I guess it's the encoding part that's unique to how Prolog
| works.
|
| Which makes sense; my understanding is that in Prolog,
| insofar as a function encodingOf(A, B) is defined by using A
| and B on either side of a bind expression to equate B as the
| encoding of A, if encodingOf(A, knownEnc) would act as a
| parser for knownEnc, then encodingOf(knownDec, B) would act
| as an encoder for knownDec -- essentially swapping the
| question "what would this parse to" given an encoded binary,
| for "what would parse to this" given a decoded term.
|
| Erlang can't do exactly that -- a given variable's "bound-
| ness" is determined during compilation, and head-clause
| variables can't be indeterminately bound -- but if it _could_
| , then you would indeed get an encoder for the same format
| "for free", since binary-pattern matching in Erlang becomes a
| binary-building expression depending on whether the variables
| in the expression are bound or unbound.
|
| (Now I'm curious how hard it would be to add indeterminately-
| bound compile-time function polymorphism to Erlang -- i.e.
| generating bytecode for the powerset of bound-ness-es of the
| variables in the head clause, only where at least one
| generates valid bytecode, probably name-mangled with a
| bitfield suffix of said bound-ness-es; and then being able to
| bind function parameters from such functions as new output
| variables in expressions, where the call sites also generate
| a call to the equivalent mangled name. The usefulness of this
| would probably be pretty small, without the other aspects of
| Prolog, but you never know...)
| Jtsummers wrote:
| GP talks about getting encoding and decoding _together_ by
| implementing the spec once. In Erlang /Elixir you can only
| get one at a time because you can't do the equivalent of
| this: some_predicate(<concrete values>,
| Decoded). % decoder some_predicate(Encoded, <concrete
| values>). % encoder some_predicate(<concrete values>,
| <concrete values>). % validator
|
| But in Prolog, that's what many predicates permit as a
| baseline. In Erlang, yes, a validator is, or can be, a
| parser/decoder. But it's not an encoder which still needs
| to be a separate function (or set of functions) to describe
| movement in the other direction.
| dwohnitmok wrote:
| This is fantastic bite-sized example of exactly what
| makes Prolog and other logic languages so different to
| program in and will probably become my go-to example for
| introducing what's so special about Prolog to programmers
| with backgrounds in other programming languages who are
| looking for examples that involve "real-world"
| programming tasks.
| lostmsu wrote:
| To go even further, how much is it different from C# or C++?
| There you'd just declare the appropriate structs and bitcast.
| superdisk wrote:
| That works okay only for fixed-sized structs that don't
| require any transformation of their members-- otherwise
| you'll need to write a traditional parser.
| lostmsu wrote:
| It is there in Prolog version too. Search for "FileSize" on
| the page.
| superdisk wrote:
| What I mean is that if you need to parse a format that
| has variable-sized structures, you can't just memcpy them
| into a struct because structs have a fixed size.
| Additionally sometimes you need to parse a value into a
| more tractable representation, for example those octal
| ASCII strings in the tar format, which requires some
| additional processing that isn't possible with
| loading/serializing a struct.
| lostmsu wrote:
| Which is also there in the Prolog code.
| jll29 wrote:
| Writing tar in PROLOG sounds "cool", a neat hack, but it is as
| practical as writing the Linux kernel in COBOL. Possible,
| perhaps, but not the best idea.
|
| It's hard enough to write things in PROLOG that _are_ ideally
| suited for the language - like natural language parsers (they
| used to be LISP or PROLOG, but these days they are all in C++ or
| FORTRAN wrapped in Python so no-one gets hurt; why? Because of
| efficiency and numeric libraries required - modern parsers for
| human languages implement probabilistic or neural models, no
| longer symbol manipulation and pattern matching).
|
| EDIT: I'm not a PROLOG hater - in fact, enjoyed reading the
| article (and I will read sequels on implementing "mount" in RPG3,
| "htop" in COMAL etc.) and my friends include some PROLOG lovers.
| I also find it important and stimulating to learn about another
| paradigm, and logical programming is an important one for many
| reasons. This rant is more about using it on the specific task of
| implementing "tar", for which it is a poor fit, as the article's
| PS shows.
| anon291 wrote:
| Binary file formats, like text formats, are simply languages,
| and declarative logic languages like Prolog, are excellent for
| parsing, as you said. Thus, it stands to reason that Prolog
| would be a very good fit for parsing data. That doesn't change
| just because the data is not ASCII-encoded.
| phyzome wrote:
| This version is also insecure, as it does not prevent file path
| traversal attacks.
___________________________________________________________________
(page generated 2023-01-18 23:00 UTC)