hngopher.com

       [HN Gopher] Aho - a Git implementation in Awk
       ___________________________________________________________________
        
       Aho - a Git implementation in Awk
        
       Author : pabs3
       Score  : 155 points
       Date   : 2024-02-10 16:05 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | artsi0m wrote:
       | Might be relevant:
       | 
       | sed-chess: https://news.ycombinator.com/item?id=37896854
       | 
       | awk-raycaster: https://github.com/TheMozg/awk-raycaster
        
         | erk__ wrote:
         | There is also a Google (and more) translate client written in
         | AWK
         | 
         | https://github.com/soimort/translate-shell
        
           | artsi0m wrote:
           | I found an awesome-awk[1] page on github and is seems to be a
           | little empty. Maybe we should contribute to it and bring some
           | examples like subj of this HN post or ahrf[2], dedicated
           | markup language for static site generators based on awk. I've
           | started with adding one true awk and bioawk implementations.
           | 
           | [1]: https://github.com/freznicek/awesome-awk [2]:
           | https://github.com/Ypnose/ahrf
        
       | bitwize wrote:
       | At first I thought this was named for _aho_ (aho), Japanese slang
       | for  "stupid", then I remembered that Alfred Aho is the 'a' in
       | 'awk'. Or maybe it's both?
        
         | bangonkeyboard wrote:
         | Both, I assume. "Git" was already slang for "stupid person," so
         | this is a clever name.
        
         | CharlesW wrote:
         | My most recent experience is hearing it in _Reservation Dogs_ :
         | https://en.wiktionary.org/wiki/aho#Navajo
        
         | ghc wrote:
         | I bet it's both. Extremely clever wordplay!
        
       | Towaway69 wrote:
       | Great project and great idea. Understanding the basics gives one
       | different perspectives for other projects and problems.
       | 
       | Back in the day I created a Web-based wiki using awk. Why?
       | Because I was using linksys router with minimal memory.
       | 
       | It was a great learning both how wikis work and what can be done
       | with awk. And since there are no libraries to fallback on, I had
       | to implement the basics and gain all the understandings.
        
       | snovymgodym wrote:
       | Awk is cool. It's a full-fledged programming language that's
       | there in anything remotely unix-flavored, but I mostly see it
       | used in one-liners to grab bits of text from piped stdout.
       | 
       | But you can use awk as a general-purpose scripting language [1],
       | in many ways it's nicer than bash for this purpose. I wonder why
       | you don't see more awk scripts in the wild. I suppose perl came
       | along and tried to combine the good features of shell, awk, and
       | sed into one language, and then people decided perl was bad and
       | moved on from that.
       | 
       | [1] Random excerpt from NetBSD's source code
       | https://github.com/NetBSD/src/blob/trunk/sys/dev/eisa/devlis...
        
         | smburdick wrote:
         | Does that excerpt start with an if-else sequence?
        
           | mksybr wrote:
           | it starts with the BEGIN block
           | 
           | https://www.gnu.org/software/gawk/manual/gawk.html#BEGIN_002.
           | .. https://www.gnu.org/software/gawk/manual/gawk.html#Pattern
           | -E...
        
         | csdvrx wrote:
         | > then people decided perl was bad and moved on from that.
         | 
         | Screw what people think. I found out I like perl. The last
         | thing I wrote is a programmatic partition editor [1] - like how
         | you use sfdisk to zero out the partitions, except I wanted to
         | do more than zap, like having the MBR and GPT partition table
         | to combine them and make hybrids.
         | 
         | I was fun, and I will use perl again (I may also use awk at one
         | point now that I see how cool it is)
         | 
         | [1] https://github.com/csdvrx/hdisk/
        
         | dkarl wrote:
         | You nailed it. Perl replaced awk and then turned out to be
         | counterproductive in a lot of cases because there was no simple
         | and broadly understood way for people to write Perl code that
         | was 1) readable for other programmers and 2) scalable to
         | medium-to-large programs.
         | 
         | Which is not to say that nobody ever figured out those things
         | and did them well, just that the success rate was low enough
         | across the industry to earn Perl a really bad reputation.
         | 
         | I'd like to see a revival of awk. It's less easy to scale up,
         | so there's very little risk that starting a project with a
         | little bit of awk results in the next person inheriting a
         | multi-thousand line awk codebase. Instead, you get an early-ish
         | rewrite into a more scalable and maintainable language.
        
           | cmgbhm wrote:
           | What Perl nailed was being useful to write cross platform
           | shell scripts. Agree that it didn't scale up but you had a
           | chance of delivering n platforms with minimal pain.
           | 
           | awk v gawk doesn't make me want o to relive those days.
        
           | kradroy wrote:
           | > I'd like to see a revival of awk. It's less easy to scale
           | up, so there's very little risk that starting a project with
           | a little bit of awk results in the next person inheriting a
           | multi-thousand line awk codebase. Instead, you get an early-
           | ish rewrite into a more scalable and maintainable language.
           | 
           | Taco Bell programming is the way to go.
           | 
           | This is the thinking I use when putting together prototypes.
           | You can do a lot with awk, sed, join, xargs, parallel (GNU),
           | etc. But it's _really_ a lot of effort to abstract in a bash
           | script, so the code is compact. I 've built many data
           | engineering/ML systems with this technique. Those command
           | line tools are SO WELL debugged and have reasonable error
           | behavior that you don't have to worry about complexities of
           | exception handling, etc.
        
         | anonnon wrote:
         | > then people decided perl was bad and moved on from that.
         | 
         | That's a large part of what's driving Awk's renaissance: devs
         | that never learned Perl to begin with want something to fill
         | the gap between shell and Python, and other devs like me who
         | (reluctantly) abandoned Perl because it was deemed "uncool" by
         | HN types, which means Perl and all code written in it now has
         | an expiration date on it. But since Awk is a POSIX standard, HN
         | types can't get rid of it.
        
           | Almondsetat wrote:
           | can't wait for the day I will be able to compile Linux
           | without Perl
        
           | bigstrat2003 wrote:
           | "HN types" can't get rid of perl either. So just use perl if
           | you want to. Personally I think perl is a terrible language
           | and that anything which is too complex for a shell script
           | (which is most things) should just be done in python. But if
           | you disagree, it's not like anyone can stop you. If your
           | issue is "my teammates hate it and want me to use something
           | else", I promise you they will be just as annoyed if you use
           | awk.
        
         | sgarland wrote:
         | Awk is incredibly useful. I wrote a script this week to parse
         | Postgres logs (many, many GB) to answer the question, "what
         | were the top users making queries in the first few minutes at
         | the top of every hour?" [0] Took a couple of functions, maybe
         | 20 LOC in total, plus some pipes through sort and uniq [1].
         | Also quite fast, especially if you prefix it with LC_ALL=C.
         | 
         | [0]: If you're wondering why there wasn't adequate
         | observability in place to not have to do this, you're not
         | wrong, but sometimes you must live with the reality you have.
         | 
         | [1]: Yes, I know gawk can do a unique sort while building the
         | list. It was late into an incident and I was tired, and | sort
         | | uniq -c | sort -rn is a lot easier to remember.
         | 
         | [1].a: Yes, I know sort has a -u arg. It doesn't provide a
         | count, and its unique function is also quite a bit slower than
         | uniq's implementation.
        
       | sampo wrote:
       | > I don't plan to add network functionality to this (even though
       | you totally can), so no _clone_ or _push_.
       | 
       | You can also git clone from a repository in a different directory
       | in the same computer. And push to.
        
         | d-lisp wrote:
         | Exactly, but having branches kind of allow you to not have to
         | create local clones. And I legit use rsync to clone locally git
         | repos. The question is : what is the best ?
        
           | nerdponx wrote:
           | You can have multiple worktrees per repo. Usually that's the
           | best solution instead of local clones.
        
             | kilroy123 wrote:
             | This is what I do for complex projects.
        
         | more-coffee wrote:
         | That is so obvious.. yet I've never thought to try this in 10
         | years of using git.
        
       | amelius wrote:
       | Does it solve the large-file bottleneck? Then I might use it for
       | my deep learning models.
        
         | supriyo-biswas wrote:
         | It doesn't, since Git's data model has to be changed to
         | content-defined chunks to solve the issue.
         | 
         | You should look at git-lfs[1] instead.
         | 
         | [1] https://git-lfs.com
        
           | amelius wrote:
           | I've tried it but it doesn't play nice with git-shell, and
           | thus for me is too much of a hassle to set up.
           | 
           | Also, I'm not a huge fan of tools that implement important
           | functionality as an afterthought, especially if those tools
           | deal with my precious data.
        
         | aseipp wrote:
         | No, Git's object store was not designed to hold large binary
         | blobs, and no implementation of Git in any language can change
         | this. It's a reasonable request; I mean, Git doesn't even deal
         | with "pretty small" binary files very well, either. But it's
         | all simply a consequence of its design that was thought up all
         | those years ago.
         | 
         | The core object storage model and data format (and many, many
         | things on top of those) have to be changed/extended/fixed
         | first, but it's realistically an immense change, so git-lfs and
         | other various solutions are about as good as it'll get in the
         | mean time.
        
       | sodapopcan wrote:
       | Not the most interesting of questions I have here, but is this
       | indentation style on function definitions a thing or is it just
       | accidental? It's in a few places, mostly before the first arg but
       | sometimes before others.
       | 
       | eg:                   function run_command(    c, shortopts,
       | longopts, quiet, directory, path, errors)
       | 
       | Just asking as this project has kind of resparked my interest in
       | awk.
        
         | michaelcampbell wrote:
         | I noticed this too and can't figure it out. Spaces between SOME
         | parameters, not all, and not consistently placed
        
         | rbonvall wrote:
         | Awk doesn't have a way to define function-local variables. All
         | variables are global, except for function parameters.
         | 
         | This spacing convention is meant to clearly separate mandatory
         | parameters and optional parameters that are sometimes only
         | introduced to "declare" a local variable.
        
           | bewuethr wrote:
           | Here's where the manual introduces the convention: https://ww
           | w.gnu.org/software/gawk/manual/gawk.html#Variable-...
        
           | sodapopcan wrote:
           | Oh cool, thank you!!
        
       | throw0101b wrote:
       | The book _The AWK Programming Language, Second Edition_ was
       | released this past September (2023):
       | 
       | * https://awk.dev
       | 
       | The first edition was published in 1988, and is available at:
       | 
       | * https://archive.org/details/pdfy-MgN0H1joIoDVoIC7
       | 
       | * Discussion: https://news.ycombinator.com/item?id=13451454
        
         | smburdick wrote:
         | Alongside Brian Kernighan, the "K" in K&R C, and much more Unix
         | lore.
        
         | adonovan wrote:
         | This 35-year gap is a great story to tell your editor whenever
         | they ask "so how's that second edition coming along...?"
        
           | kazinator wrote:
           | The original authors have done next to nothing to improve Awk
           | in those years; it's embarrassing to be writing another book
           | on a subject that they have not advanced.
           | 
           | Awk could use improvement in numerous areas. Oh, for
           | instance, you can pass associative arrays into functions, but
           | not return them. Functions that filter array to array have to
           | take an output array parameter.
           | 
           | Using extra parameters as the only way to get local variables
           | is also a smell.
           | 
           | a[i] syntax cannot index into strings, what the hell?
        
             | chris_wot wrote:
             | Just for contex, Aho is 82, Weinberger is 81 and Kerrighan
             | is dead.
        
               | ksherlock wrote:
               | Brian Kernighan is alive.
        
             | ksherlock wrote:
             | CVS support (--csv) was added last year (to the one true
             | awk and gnu awk)
        
           | EasyMark wrote:
           | GRRM should probably use that as something to refer to as
           | people pointing to him completing "the damn book"
        
       | dang wrote:
       | Related:
       | 
       |  _A Git Implementation in Awk_ -
       | https://news.ycombinator.com/item?id=28771841 - Oct 2021 (96
       | comments)
        
       | forrestthewoods wrote:
       | Neat project. It's always fun to see tools pushed beyond their
       | normal use cases.
       | 
       | That said, it should be a criminal offense to write any tool this
       | large and complex in any language that can't be used in a
       | powerful step debugger.
       | 
       | TBH I'm increasingly frustrated by the amount of code written in
       | Bash. I kind of hate Python for various reasons. But if 100% of
       | Bash was replaced with Python I think the world would be a better
       | place.
        
         | riddley wrote:
         | I'm mostly in concordance, but one thing I think every
         | scripting language got wrong is how painful it is to run
         | external binaries. Sure it can be done but it could be as easy
         | as a shell script.
        
           | forrestthewoods wrote:
           | Agreed.
           | 
           | That said, stdout/stderr is such a bloody, inconsistent
           | nightmare. I'm not totally convinced that "chain small binary
           | programs together" is better than "one language with useful
           | libraries".
           | 
           | Bash is admittedly nice for small things. But it always
           | spirals out of control. And rarely gets ported.
           | 
           | Also my life is primarily Windows and if you want everything
           | to "just work" across mac/linux/win it's easier to just use
           | Python or sometimes even Rust. I often wish I could easily
           | write and run single-file rust scripts.
        
           | wazbug wrote:
           | Ruby is nice in this regard though :-)                   x =
           | `git --version`         puts(x) #=> "git version 2.43.0"
        
       | Keyframe wrote:
       | There's this lingering thought in my head that with a bunch of
       | GNU utils/programs and probably not much more one could create
       | these omnipotent databases and processing tools that would
       | surpass in performance and capabilities tools specialized in it.
       | Anyone else feels like that?
        
         | zilti wrote:
         | Oh yes, and a project like this exists/existed:
         | https://en.wikipedia.org/wiki/Strozzi_NoSQL?wprov=sfla1
        
           | Keyframe wrote:
           | Fantastic, first time I hear about it. So there's _something_
           | to it, alright.
        
       | kazinator wrote:
       | If this used cppawk (which didn't exist when this was developed),
       | it could use #include. This is nicely relative to the file; no
       | AWKPATH. Also you can just "build" the preprocessed program into
       | a single file which then doesn't need cppawk.
       | 
       | https://www.kylheku.com/cgit/cppawk/about/
        
       | freedomben wrote:
       | When people ask me why I say that the linux command line is the
       | best dev environment, Awk is one of the tools I often point to.
       | When you know even basic awk, you can do a lot with a little.
       | IDEs actually start to feel clunky.
       | 
       | If you're looking to get into Awk, and you learn well from a
       | lecture style, I put together a talk for Linux Fest Northwest
       | some years ago and recorded it for Youtube:
       | https://youtu.be/E5aQxIdjT0M
        
         | corytheboyd wrote:
         | How does awk replace an IDE? I love awk, I love how powerful it
         | is if you spend the time to learn it, but if I didn't have an
         | IDE I would be significantly less productive. Most of what an
         | IDE does is help you understand and change code, not text
         | editing operations. Not trying to say you're wrong, just
         | curious what your angle is with that statement.
        
       | kazinator wrote:
       | When you see code like:                 function
       | read_objfile(obj, objpath,    bytes, end_of_header, header,
       | end_of_type, type, size,
       | bytes_after_header)
       | 
       | the parameters separated by the big white space are local
       | variables. It's possible to pass them values, but you're not
       | supposed to.
       | 
       | I wrote a patch for GNU Awk to give it a let statement for
       | binding true lexical variables, so that this could be:
       | function read_objfile(obj, objpath)       {          @let (bytes,
       | end_of_header, header,                end_of_type, type, size,
       | bytes_after_header)          {          }       }
       | 
       | Unfortunately, this was rejected by the project; I was encouraged
       | to make a renamed fork of GNU Awk, so that's what I did.
       | 
       | https://www.kylheku.com/cgit/egawk/about/
        
         | coliveira wrote:
         | I feel they have a point, gawk has already too many differences
         | compared to awk. If you introduce even more distinct syntax, it
         | is better just to fork it and call it something else.
        
       | earthboundkid wrote:
       | The perfect project name doesn't exi...
        
         | runiq wrote:
         | Yeah, digging the name. For the uninitiated: 'aho' is basically
         | 'git' in Osakan.
        
       | EasyMark wrote:
       | can be built as a part of busybox too, which I've found useful a
       | few times on embedded linux systems with limited
       | resources/program space
        
       ___________________________________________________________________
       (page generated 2024-02-10 23:00 UTC)