hngopher.com

       [HN Gopher] Tar.pl - A tar creator and extractor in approx. 100 ...
       ___________________________________________________________________
        
       Tar.pl - A tar creator and extractor in approx. 100 lines of Prolog
        
       Author : superdisk
       Score  : 203 points
       Date   : 2023-01-18 14:38 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | 082349872349872 wrote:
       | DCGs are an awesome way to specify a format but a poor way to
       | implement it?
        
         | mark_l_watson wrote:
         | I agree and the code itself is awesome also.
         | 
         | Around 30 years ago I was really into Prolog but recently I
         | have tried getting back into it and I am having a rough go at
         | it. I still have the old Art of Prolog book, so that might be a
         | good reentry point.
        
           | philjohn wrote:
           | I still maintain that learning Prolog 25 years ago as part of
           | my degree (AI + Comp Sci) made me a better programmer.
        
           | mncharity wrote:
           | > recently I have tried getting back into it and I am having
           | a rough go at it
           | 
           | Fwiw, doing problem setup and result processing in another
           | language can avoid much pain. So prolog only where it excels,
           | and a mainstream ecosystem for the rest. I've liked batch
           | mode "write a prolog file, run it, read result (eg json, or
           | wrapper-language code to eval)", but was recently thinking of
           | trying PySwip[1].
           | 
           | [1] https://github.com/yuce/pyswip
        
         | nmadden wrote:
         | That's been my experience, and chimes with what the author
         | writes at the end about performance. I remember trying to
         | stream a few hundred MB of CSV data with Prolog a few years ago
         | and giving up after not being able to either write or find a
         | library that could cope with even that kind of size (which I
         | wouldn't consider particularly big). (In the end, I managed to
         | bulk import the data into SQLite, which took a few seconds,
         | then used SQL instead).
         | 
         | I should give Scryer Prolog a go to see if it is any better,
         | and I'm by no means an expert Prolog programmer so YMMV.
        
           | richard_todd wrote:
           | It's all mind-bending fun until the interpreter starts
           | blowing one of its internal stacks. Then it's hours of tedium
           | tracing through queries looking for ways to contain the
           | computation.
        
       | rhelz wrote:
       | Nicely done. Prolog is killer for database programming/querying
       | and parsing/formatting. Since these are not exactly "niche"
       | tasks, its kinda sad that prolog isn't more popular.
        
         | dragonwriter wrote:
         | One reason it is not more popular is that it _was_ , but then
         | its Datalog subset, and supersets of Datalog that are different
         | than Prolog, took a lot of that use from it. (Often with the
         | Prolog syntax removed since it is frequently embedded in other
         | languages, so it becomes a semantic rather than syntactic
         | subset of Prolog.)
        
         | cmrdporcupine wrote:
         | There does at least seem to be a recent uptick in popularity
         | and interest in Datalog, which expresses a more constrained
         | form of similar concepts specialized on data management tasks.
         | Not as general purpose as Prolog, but more suitable for a
         | database environment.
         | 
         | (Also here's a gratuitous plug for my employer's product in a
         | similar vein: https://docs.relational.ai/rel/intro/overview)
        
         | optimalsolver wrote:
         | Wouldn't Datalog be a better alternative?
        
           | anon291 wrote:
           | Yes. Datalog is not Turing-complete; thus is much easier to
           | reason about, and good query optimizers can be much more
           | aggressive. Once you introduce cuts (as in prolog), you have
           | to throw a lot of that out the window. But Datalog is a
           | subset of prolog.
        
           | rhelz wrote:
           | Datalog is great for database programming, and since it is a
           | restricted subset of prolog, a datalog comes in every box of
           | prolog :-)
        
             | Avshalom wrote:
             | Sort of.
             | 
             | Because of it's restrictions it traditionally uses
             | different search/resolution strategies than Prolog. And in
             | the other direction the SLD resolution strategy Prolog
             | traditionally uses can fail on syntactically valid Datalog
             | (clause ordering issues).
             | 
             | That said, the big Prolog implementations all support
             | multiple resolution strategies although I suspect there are
             | a lot of various optimizations Datalog squeezes out of it's
             | restrictions that let it handle things that might not be
             | reasonable in a given Prolog.
        
               | rhelz wrote:
               | Good point; in the olden days of prolog only the SLD
               | search strategy came out of the box, and you had to roll
               | your own if you wanted others. However, happily a lot of
               | the tabled programming research from the XSB prolog group
               | (and others) has osmosed out into the mainstream prolog
               | implementations by now.
        
             | cmrdporcupine wrote:
             | Aren't there significant differences in implementation
             | between a Datalog system and Prolog running similar rules,
             | though? Bottom-up vs top-down eval, is what I read...
        
               | rhelz wrote:
               | Classically, yes, but most prolog implementations these
               | days support a tabled search strategy as Datalog
               | typically does.
        
           | gamache wrote:
           | On one hand, Prolog is more powerful than Datalog. But on the
           | other hand, Prolog is more powerful than Datalog.
        
             | telekid wrote:
             | With great power comes great irresponsibility.
        
             | DonHopkins wrote:
             | Just as C combines the power and performance of assembly
             | language with the flexibility and ease-of-use of assembly
             | language.
        
               | maweki wrote:
               | In this case, it's even worse. Prolog is computationally
               | complete. With everything that comes with rice's theorem.
               | Every non-trivial property of a program being generally
               | undecidable.
               | 
               | Datalog, on the other hand, isn't. It's terminating, like
               | SQL. And many program properties are indeed decidable.
        
             | tpoacher wrote:
             | Query: morepowerfullthan(Prolog, Datalog).
             | 
             | yes; yes; yes; yes; yes; yes; yes; yes; ...
        
       | aeneasmackenzie wrote:
       | I like these data description languages but most of them just
       | have support for sequential structures and don't support offsets
       | within the file. These are very common in file formats created
       | for use with C or a similar language, which is most of them. It's
       | not hard to add but it destroys the elegance of the formalism.
        
         | TheRealPomax wrote:
         | What's the point of a comment like this when Prolog supports
         | file seeking just fine? That's like complaining about the lack
         | of structs in procedural languages when the post you're
         | commenting on is a C++ post...
        
           | aeneasmackenzie wrote:
           | You can file seek in any language, but not in most file
           | format DSLs (or this one, because it's local). The author
           | mentions QuickBMS and thus is aware of this, but most
           | academic format languages don't support it.
        
       | d33 wrote:
       | Just curious, is .pl the standard extension for Prolog programs?
       | Why would Perl and Prolog choose the same one, instead of one
       | using e.g. .pg?
        
         | mathstuf wrote:
         | Fun trivia: I got a Emacs user to switch to Vim because in one
         | of our class projects using Prolog, Emacs kept trying to
         | highlight it as Perl (though it might be valid Perl syntax now
         | that I think about it...). When I showed him that Vim did that
         | for an empty file, but if it detected "Prolog-y" content (IIRC,
         | the comment marker or a `:-` did it), it defaulted to Prolog
         | highlighting.
         | 
         | No idea if Emacs ever fixed it (this was 2009 or so).
        
           | waynecochran wrote:
           | This is a trivial fix for an emacs user. M-x prolog-mode.
        
             | puffoflogic wrote:
             | Instructions unclear, pinky finger fell off.
        
         | davelee wrote:
         | A less common extension is .pro.
        
         | Tomte wrote:
         | Probably because few people used both.
         | 
         | .pro is also common for Prolog code, if you care about the
         | extension clash.
        
           | tmtvl wrote:
           | .pro is also used by IDL and Qt. I imagine there are few two
           | or three letter extensions that aren't overloaded.
        
           | DonHopkins wrote:
           | Because .log was taken.
        
         | civopsec wrote:
         | > instead of one using e.g. .pg?
         | 
         | Why not just `.prolog`? In this day and age. (Maybe there's
         | some reason like "then I can't put these source files on FAT
         | filesystems", I don't know.)
         | 
         | Java seems to do fine with it's longer-than-three-letters
         | `.java`.
        
           | Avshalom wrote:
           | Most editors will understand .prolog, but on the other hand
           | it's 15 years older than perl and it's not hard to teach
           | editors to treat .pl as prolog.
        
           | anon291 wrote:
           | .pl is used because prolog is really old and FAT used to
           | enforce extension limits. Not relevant nowadays, but it's
           | stuck around.
        
             | [deleted]
        
       | dmd wrote:
       | This makes me miss using Ab Initio, which in its language "DML"
       | lets you describe (even extremely complex) record formats
       | declaratively.
        
       | kartic wrote:
       | Makes me want to learn Prolog.
        
       | umvi wrote:
       | > It's unfortunately absolutely dog slow
       | 
       | Seems like a common chicken-egg problem of niche languages.
       | Performance doesn't improve until lots of people are using it,
       | but lots of people won't use it until performance improves.
        
         | tmtvl wrote:
         | Eh, people will happily use slow languages if they are marketed
         | the right way. Look at Python, it has a very large following,
         | yet it's so slow that the dog's descendants will have evolved
         | to adapt to a marine environment by the time Python finishes
         | running Hello World.
        
       | lliamander wrote:
       | The bidirectional nature of predicates is one of the cool aspects
       | of logic programming.
       | 
       | It is also cool that is completely transparent whether a
       | predicate returns a single value or multiple values. It's sort of
       | like if every function were implicitly a generator.
        
       | pmarreck wrote:
       | Now do it in Bash. ;)
        
         | olddustytrail wrote:
         | Sure                   #!/bin/bash         tar "$@"
         | 
         | Takes the same arguments as the original!
        
       | jonas21 wrote:
       | I don't get it. tar is such a simple format you could do this in
       | like 50 lines of C. Most of that would be declaring a struct. And
       | unlike this implementation, it would be performant.
        
         | mattkrause wrote:
         | The cool thing is that itt uses the same code as the reader and
         | writer. In essence, you've written a function: `tar(dstTarFile,
         | srcFiles)` You provide _one_ of the two and query Prolog to
         | find the other: you pass in a tar file as the first argument,
         | and it tells you what `srcFiles` would need to be to create
         | that archive, effectively extracting it. Alternately, you pass
         | in `srcFiles` and it tells you what the corresponding
         | dstTarFile should be, creating an archive.
         | 
         | Another way to think about it is that the Prolog code isn't
         | telling you how to read/write tar files. Instead, imagine it as
         | a cost function that tells you how well this tar file
         | corresponds to the uncompressed representation of those files,
         | which you're minimizing.
         | 
         | If you were to write it in C, the code is imperative: do this,
         | then that, then this other thing. Consequently, the reader and
         | writer code are totally different. For example, to process the
         | file size, you would write functions two functions: `char*
         | uIntToOctalASCII(unsigned int sz)` and `octalASCIIToUInt(char*
         | szstr)`. Semantically, these are inverses:
         | `octalASCIIToUInt(uIntToOctalASCII(x)) == x`[0] but the
         | implementations do not really overlap.
         | 
         | [0] And leaks memory, but whatever....
        
         | dagss wrote:
         | First time I see Prolog...I thougt that seeing the compiler
         | invert a function automatically is quite impressing. I mean
         | just being able to take only the forwards declaration and
         | figure out the inverse is interesting in itself.
        
       | marbu wrote:
       | From the README: <quote>We're going to declaratively define a
       | parser for the tar format, and once we're done, our code will not
       | only extract .tar files, but create them as well.</quote>
       | 
       | I recall reading about this somewhere, but haven't seen such nice
       | real world example before. Well done.
        
         | zelphirkalt wrote:
         | I think if done right, this falls out of Prolog automatically,
         | as you program in relations, rather than functions. Depending
         | on which places you leave empty in a query, Prolog will work to
         | fill in the empty places. One is probably the tar file and one
         | is the folder being added to a tar archive, but none of the 2
         | is any more important than the other.
        
         | Avshalom wrote:
         | There's a cool little bencode library that also works like
         | that, all of 30 short lines too.
         | 
         | https://github.com/mndrix/bencode/blob/master/prolog/bencode...
         | 
         | https://en.wikipedia.org/wiki/Bencode
        
       | moss2 wrote:
       | Absolutely marvelous.
        
       | derefr wrote:
       | Sounds exotic at first glance, but how different is this from the
       | Erlang implementation of tar parsing? I.e. how much of this is
       | relying on Prolog-specific features, vs. how much is just
       | succinct head-clause pattern matching?
        
         | superdisk wrote:
         | The unique part is that you get encoding and decoding for free
         | when just describing the spec. It is essentially just succinct
         | head-clause matching though.
        
           | derefr wrote:
           | An idiomatic Erlang/Elixir implementation of validation for a
           | binary format _would_ also give you decoding for free -- see
           | e.g. https://zohaib.me/binary-pattern-matching-in-elixir/. So
           | I guess it's the encoding part that's unique to how Prolog
           | works.
           | 
           | Which makes sense; my understanding is that in Prolog,
           | insofar as a function encodingOf(A, B) is defined by using A
           | and B on either side of a bind expression to equate B as the
           | encoding of A, if encodingOf(A, knownEnc) would act as a
           | parser for knownEnc, then encodingOf(knownDec, B) would act
           | as an encoder for knownDec -- essentially swapping the
           | question "what would this parse to" given an encoded binary,
           | for "what would parse to this" given a decoded term.
           | 
           | Erlang can't do exactly that -- a given variable's "bound-
           | ness" is determined during compilation, and head-clause
           | variables can't be indeterminately bound -- but if it _could_
           | , then you would indeed get an encoder for the same format
           | "for free", since binary-pattern matching in Erlang becomes a
           | binary-building expression depending on whether the variables
           | in the expression are bound or unbound.
           | 
           | (Now I'm curious how hard it would be to add indeterminately-
           | bound compile-time function polymorphism to Erlang -- i.e.
           | generating bytecode for the powerset of bound-ness-es of the
           | variables in the head clause, only where at least one
           | generates valid bytecode, probably name-mangled with a
           | bitfield suffix of said bound-ness-es; and then being able to
           | bind function parameters from such functions as new output
           | variables in expressions, where the call sites also generate
           | a call to the equivalent mangled name. The usefulness of this
           | would probably be pretty small, without the other aspects of
           | Prolog, but you never know...)
        
             | Jtsummers wrote:
             | GP talks about getting encoding and decoding _together_ by
             | implementing the spec once. In Erlang /Elixir you can only
             | get one at a time because you can't do the equivalent of
             | this:                 some_predicate(<concrete values>,
             | Decoded). % decoder       some_predicate(Encoded, <concrete
             | values>). % encoder       some_predicate(<concrete values>,
             | <concrete values>). % validator
             | 
             | But in Prolog, that's what many predicates permit as a
             | baseline. In Erlang, yes, a validator is, or can be, a
             | parser/decoder. But it's not an encoder which still needs
             | to be a separate function (or set of functions) to describe
             | movement in the other direction.
        
               | dwohnitmok wrote:
               | This is fantastic bite-sized example of exactly what
               | makes Prolog and other logic languages so different to
               | program in and will probably become my go-to example for
               | introducing what's so special about Prolog to programmers
               | with backgrounds in other programming languages who are
               | looking for examples that involve "real-world"
               | programming tasks.
        
         | lostmsu wrote:
         | To go even further, how much is it different from C# or C++?
         | There you'd just declare the appropriate structs and bitcast.
        
           | superdisk wrote:
           | That works okay only for fixed-sized structs that don't
           | require any transformation of their members-- otherwise
           | you'll need to write a traditional parser.
        
             | lostmsu wrote:
             | It is there in Prolog version too. Search for "FileSize" on
             | the page.
        
               | superdisk wrote:
               | What I mean is that if you need to parse a format that
               | has variable-sized structures, you can't just memcpy them
               | into a struct because structs have a fixed size.
               | Additionally sometimes you need to parse a value into a
               | more tractable representation, for example those octal
               | ASCII strings in the tar format, which requires some
               | additional processing that isn't possible with
               | loading/serializing a struct.
        
               | lostmsu wrote:
               | Which is also there in the Prolog code.
        
       | jll29 wrote:
       | Writing tar in PROLOG sounds "cool", a neat hack, but it is as
       | practical as writing the Linux kernel in COBOL. Possible,
       | perhaps, but not the best idea.
       | 
       | It's hard enough to write things in PROLOG that _are_ ideally
       | suited for the language - like natural language parsers (they
       | used to be LISP or PROLOG, but these days they are all in C++ or
       | FORTRAN wrapped in Python so no-one gets hurt; why? Because of
       | efficiency and numeric libraries required - modern parsers for
       | human languages implement probabilistic or neural models, no
       | longer symbol manipulation and pattern matching).
       | 
       | EDIT: I'm not a PROLOG hater - in fact, enjoyed reading the
       | article (and I will read sequels on implementing "mount" in RPG3,
       | "htop" in COMAL etc.) and my friends include some PROLOG lovers.
       | I also find it important and stimulating to learn about another
       | paradigm, and logical programming is an important one for many
       | reasons. This rant is more about using it on the specific task of
       | implementing "tar", for which it is a poor fit, as the article's
       | PS shows.
        
         | anon291 wrote:
         | Binary file formats, like text formats, are simply languages,
         | and declarative logic languages like Prolog, are excellent for
         | parsing, as you said. Thus, it stands to reason that Prolog
         | would be a very good fit for parsing data. That doesn't change
         | just because the data is not ASCII-encoded.
        
         | phyzome wrote:
         | This version is also insecure, as it does not prevent file path
         | traversal attacks.
        
       ___________________________________________________________________
       (page generated 2023-01-18 23:00 UTC)