[HN Gopher] Aho - a Git implementation in Awk
___________________________________________________________________
Aho - a Git implementation in Awk
Author : pabs3
Score : 155 points
Date : 2024-02-10 16:05 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| artsi0m wrote:
| Might be relevant:
|
| sed-chess: https://news.ycombinator.com/item?id=37896854
|
| awk-raycaster: https://github.com/TheMozg/awk-raycaster
| erk__ wrote:
| There is also a Google (and more) translate client written in
| AWK
|
| https://github.com/soimort/translate-shell
| artsi0m wrote:
| I found an awesome-awk[1] page on github and is seems to be a
| little empty. Maybe we should contribute to it and bring some
| examples like subj of this HN post or ahrf[2], dedicated
| markup language for static site generators based on awk. I've
| started with adding one true awk and bioawk implementations.
|
| [1]: https://github.com/freznicek/awesome-awk [2]:
| https://github.com/Ypnose/ahrf
| bitwize wrote:
| At first I thought this was named for _aho_ (aho), Japanese slang
| for "stupid", then I remembered that Alfred Aho is the 'a' in
| 'awk'. Or maybe it's both?
| bangonkeyboard wrote:
| Both, I assume. "Git" was already slang for "stupid person," so
| this is a clever name.
| CharlesW wrote:
| My most recent experience is hearing it in _Reservation Dogs_ :
| https://en.wiktionary.org/wiki/aho#Navajo
| ghc wrote:
| I bet it's both. Extremely clever wordplay!
| Towaway69 wrote:
| Great project and great idea. Understanding the basics gives one
| different perspectives for other projects and problems.
|
| Back in the day I created a Web-based wiki using awk. Why?
| Because I was using linksys router with minimal memory.
|
| It was a great learning both how wikis work and what can be done
| with awk. And since there are no libraries to fallback on, I had
| to implement the basics and gain all the understandings.
| snovymgodym wrote:
| Awk is cool. It's a full-fledged programming language that's
| there in anything remotely unix-flavored, but I mostly see it
| used in one-liners to grab bits of text from piped stdout.
|
| But you can use awk as a general-purpose scripting language [1],
| in many ways it's nicer than bash for this purpose. I wonder why
| you don't see more awk scripts in the wild. I suppose perl came
| along and tried to combine the good features of shell, awk, and
| sed into one language, and then people decided perl was bad and
| moved on from that.
|
| [1] Random excerpt from NetBSD's source code
| https://github.com/NetBSD/src/blob/trunk/sys/dev/eisa/devlis...
| smburdick wrote:
| Does that excerpt start with an if-else sequence?
| mksybr wrote:
| it starts with the BEGIN block
|
| https://www.gnu.org/software/gawk/manual/gawk.html#BEGIN_002.
| .. https://www.gnu.org/software/gawk/manual/gawk.html#Pattern
| -E...
| csdvrx wrote:
| > then people decided perl was bad and moved on from that.
|
| Screw what people think. I found out I like perl. The last
| thing I wrote is a programmatic partition editor [1] - like how
| you use sfdisk to zero out the partitions, except I wanted to
| do more than zap, like having the MBR and GPT partition table
| to combine them and make hybrids.
|
| I was fun, and I will use perl again (I may also use awk at one
| point now that I see how cool it is)
|
| [1] https://github.com/csdvrx/hdisk/
| dkarl wrote:
| You nailed it. Perl replaced awk and then turned out to be
| counterproductive in a lot of cases because there was no simple
| and broadly understood way for people to write Perl code that
| was 1) readable for other programmers and 2) scalable to
| medium-to-large programs.
|
| Which is not to say that nobody ever figured out those things
| and did them well, just that the success rate was low enough
| across the industry to earn Perl a really bad reputation.
|
| I'd like to see a revival of awk. It's less easy to scale up,
| so there's very little risk that starting a project with a
| little bit of awk results in the next person inheriting a
| multi-thousand line awk codebase. Instead, you get an early-ish
| rewrite into a more scalable and maintainable language.
| cmgbhm wrote:
| What Perl nailed was being useful to write cross platform
| shell scripts. Agree that it didn't scale up but you had a
| chance of delivering n platforms with minimal pain.
|
| awk v gawk doesn't make me want o to relive those days.
| kradroy wrote:
| > I'd like to see a revival of awk. It's less easy to scale
| up, so there's very little risk that starting a project with
| a little bit of awk results in the next person inheriting a
| multi-thousand line awk codebase. Instead, you get an early-
| ish rewrite into a more scalable and maintainable language.
|
| Taco Bell programming is the way to go.
|
| This is the thinking I use when putting together prototypes.
| You can do a lot with awk, sed, join, xargs, parallel (GNU),
| etc. But it's _really_ a lot of effort to abstract in a bash
| script, so the code is compact. I 've built many data
| engineering/ML systems with this technique. Those command
| line tools are SO WELL debugged and have reasonable error
| behavior that you don't have to worry about complexities of
| exception handling, etc.
| anonnon wrote:
| > then people decided perl was bad and moved on from that.
|
| That's a large part of what's driving Awk's renaissance: devs
| that never learned Perl to begin with want something to fill
| the gap between shell and Python, and other devs like me who
| (reluctantly) abandoned Perl because it was deemed "uncool" by
| HN types, which means Perl and all code written in it now has
| an expiration date on it. But since Awk is a POSIX standard, HN
| types can't get rid of it.
| Almondsetat wrote:
| can't wait for the day I will be able to compile Linux
| without Perl
| bigstrat2003 wrote:
| "HN types" can't get rid of perl either. So just use perl if
| you want to. Personally I think perl is a terrible language
| and that anything which is too complex for a shell script
| (which is most things) should just be done in python. But if
| you disagree, it's not like anyone can stop you. If your
| issue is "my teammates hate it and want me to use something
| else", I promise you they will be just as annoyed if you use
| awk.
| sgarland wrote:
| Awk is incredibly useful. I wrote a script this week to parse
| Postgres logs (many, many GB) to answer the question, "what
| were the top users making queries in the first few minutes at
| the top of every hour?" [0] Took a couple of functions, maybe
| 20 LOC in total, plus some pipes through sort and uniq [1].
| Also quite fast, especially if you prefix it with LC_ALL=C.
|
| [0]: If you're wondering why there wasn't adequate
| observability in place to not have to do this, you're not
| wrong, but sometimes you must live with the reality you have.
|
| [1]: Yes, I know gawk can do a unique sort while building the
| list. It was late into an incident and I was tired, and | sort
| | uniq -c | sort -rn is a lot easier to remember.
|
| [1].a: Yes, I know sort has a -u arg. It doesn't provide a
| count, and its unique function is also quite a bit slower than
| uniq's implementation.
| sampo wrote:
| > I don't plan to add network functionality to this (even though
| you totally can), so no _clone_ or _push_.
|
| You can also git clone from a repository in a different directory
| in the same computer. And push to.
| d-lisp wrote:
| Exactly, but having branches kind of allow you to not have to
| create local clones. And I legit use rsync to clone locally git
| repos. The question is : what is the best ?
| nerdponx wrote:
| You can have multiple worktrees per repo. Usually that's the
| best solution instead of local clones.
| kilroy123 wrote:
| This is what I do for complex projects.
| more-coffee wrote:
| That is so obvious.. yet I've never thought to try this in 10
| years of using git.
| amelius wrote:
| Does it solve the large-file bottleneck? Then I might use it for
| my deep learning models.
| supriyo-biswas wrote:
| It doesn't, since Git's data model has to be changed to
| content-defined chunks to solve the issue.
|
| You should look at git-lfs[1] instead.
|
| [1] https://git-lfs.com
| amelius wrote:
| I've tried it but it doesn't play nice with git-shell, and
| thus for me is too much of a hassle to set up.
|
| Also, I'm not a huge fan of tools that implement important
| functionality as an afterthought, especially if those tools
| deal with my precious data.
| aseipp wrote:
| No, Git's object store was not designed to hold large binary
| blobs, and no implementation of Git in any language can change
| this. It's a reasonable request; I mean, Git doesn't even deal
| with "pretty small" binary files very well, either. But it's
| all simply a consequence of its design that was thought up all
| those years ago.
|
| The core object storage model and data format (and many, many
| things on top of those) have to be changed/extended/fixed
| first, but it's realistically an immense change, so git-lfs and
| other various solutions are about as good as it'll get in the
| mean time.
| sodapopcan wrote:
| Not the most interesting of questions I have here, but is this
| indentation style on function definitions a thing or is it just
| accidental? It's in a few places, mostly before the first arg but
| sometimes before others.
|
| eg: function run_command( c, shortopts,
| longopts, quiet, directory, path, errors)
|
| Just asking as this project has kind of resparked my interest in
| awk.
| michaelcampbell wrote:
| I noticed this too and can't figure it out. Spaces between SOME
| parameters, not all, and not consistently placed
| rbonvall wrote:
| Awk doesn't have a way to define function-local variables. All
| variables are global, except for function parameters.
|
| This spacing convention is meant to clearly separate mandatory
| parameters and optional parameters that are sometimes only
| introduced to "declare" a local variable.
| bewuethr wrote:
| Here's where the manual introduces the convention: https://ww
| w.gnu.org/software/gawk/manual/gawk.html#Variable-...
| sodapopcan wrote:
| Oh cool, thank you!!
| throw0101b wrote:
| The book _The AWK Programming Language, Second Edition_ was
| released this past September (2023):
|
| * https://awk.dev
|
| The first edition was published in 1988, and is available at:
|
| * https://archive.org/details/pdfy-MgN0H1joIoDVoIC7
|
| * Discussion: https://news.ycombinator.com/item?id=13451454
| smburdick wrote:
| Alongside Brian Kernighan, the "K" in K&R C, and much more Unix
| lore.
| adonovan wrote:
| This 35-year gap is a great story to tell your editor whenever
| they ask "so how's that second edition coming along...?"
| kazinator wrote:
| The original authors have done next to nothing to improve Awk
| in those years; it's embarrassing to be writing another book
| on a subject that they have not advanced.
|
| Awk could use improvement in numerous areas. Oh, for
| instance, you can pass associative arrays into functions, but
| not return them. Functions that filter array to array have to
| take an output array parameter.
|
| Using extra parameters as the only way to get local variables
| is also a smell.
|
| a[i] syntax cannot index into strings, what the hell?
| chris_wot wrote:
| Just for contex, Aho is 82, Weinberger is 81 and Kerrighan
| is dead.
| ksherlock wrote:
| Brian Kernighan is alive.
| ksherlock wrote:
| CVS support (--csv) was added last year (to the one true
| awk and gnu awk)
| EasyMark wrote:
| GRRM should probably use that as something to refer to as
| people pointing to him completing "the damn book"
| dang wrote:
| Related:
|
| _A Git Implementation in Awk_ -
| https://news.ycombinator.com/item?id=28771841 - Oct 2021 (96
| comments)
| forrestthewoods wrote:
| Neat project. It's always fun to see tools pushed beyond their
| normal use cases.
|
| That said, it should be a criminal offense to write any tool this
| large and complex in any language that can't be used in a
| powerful step debugger.
|
| TBH I'm increasingly frustrated by the amount of code written in
| Bash. I kind of hate Python for various reasons. But if 100% of
| Bash was replaced with Python I think the world would be a better
| place.
| riddley wrote:
| I'm mostly in concordance, but one thing I think every
| scripting language got wrong is how painful it is to run
| external binaries. Sure it can be done but it could be as easy
| as a shell script.
| forrestthewoods wrote:
| Agreed.
|
| That said, stdout/stderr is such a bloody, inconsistent
| nightmare. I'm not totally convinced that "chain small binary
| programs together" is better than "one language with useful
| libraries".
|
| Bash is admittedly nice for small things. But it always
| spirals out of control. And rarely gets ported.
|
| Also my life is primarily Windows and if you want everything
| to "just work" across mac/linux/win it's easier to just use
| Python or sometimes even Rust. I often wish I could easily
| write and run single-file rust scripts.
| wazbug wrote:
| Ruby is nice in this regard though :-) x =
| `git --version` puts(x) #=> "git version 2.43.0"
| Keyframe wrote:
| There's this lingering thought in my head that with a bunch of
| GNU utils/programs and probably not much more one could create
| these omnipotent databases and processing tools that would
| surpass in performance and capabilities tools specialized in it.
| Anyone else feels like that?
| zilti wrote:
| Oh yes, and a project like this exists/existed:
| https://en.wikipedia.org/wiki/Strozzi_NoSQL?wprov=sfla1
| Keyframe wrote:
| Fantastic, first time I hear about it. So there's _something_
| to it, alright.
| kazinator wrote:
| If this used cppawk (which didn't exist when this was developed),
| it could use #include. This is nicely relative to the file; no
| AWKPATH. Also you can just "build" the preprocessed program into
| a single file which then doesn't need cppawk.
|
| https://www.kylheku.com/cgit/cppawk/about/
| freedomben wrote:
| When people ask me why I say that the linux command line is the
| best dev environment, Awk is one of the tools I often point to.
| When you know even basic awk, you can do a lot with a little.
| IDEs actually start to feel clunky.
|
| If you're looking to get into Awk, and you learn well from a
| lecture style, I put together a talk for Linux Fest Northwest
| some years ago and recorded it for Youtube:
| https://youtu.be/E5aQxIdjT0M
| corytheboyd wrote:
| How does awk replace an IDE? I love awk, I love how powerful it
| is if you spend the time to learn it, but if I didn't have an
| IDE I would be significantly less productive. Most of what an
| IDE does is help you understand and change code, not text
| editing operations. Not trying to say you're wrong, just
| curious what your angle is with that statement.
| kazinator wrote:
| When you see code like: function
| read_objfile(obj, objpath, bytes, end_of_header, header,
| end_of_type, type, size,
| bytes_after_header)
|
| the parameters separated by the big white space are local
| variables. It's possible to pass them values, but you're not
| supposed to.
|
| I wrote a patch for GNU Awk to give it a let statement for
| binding true lexical variables, so that this could be:
| function read_objfile(obj, objpath) { @let (bytes,
| end_of_header, header, end_of_type, type, size,
| bytes_after_header) { } }
|
| Unfortunately, this was rejected by the project; I was encouraged
| to make a renamed fork of GNU Awk, so that's what I did.
|
| https://www.kylheku.com/cgit/egawk/about/
| coliveira wrote:
| I feel they have a point, gawk has already too many differences
| compared to awk. If you introduce even more distinct syntax, it
| is better just to fork it and call it something else.
| earthboundkid wrote:
| The perfect project name doesn't exi...
| runiq wrote:
| Yeah, digging the name. For the uninitiated: 'aho' is basically
| 'git' in Osakan.
| EasyMark wrote:
| can be built as a part of busybox too, which I've found useful a
| few times on embedded linux systems with limited
| resources/program space
___________________________________________________________________
(page generated 2024-02-10 23:00 UTC)