[HN Gopher] Unix philosophy without left-pad, Part 2: Minimizing...
___________________________________________________________________
Unix philosophy without left-pad, Part 2: Minimizing dependencies
Author : lizmat
Score : 98 points
Date : 2021-12-11 11:42 UTC (11 hours ago)
(HTM) web link (raku-advent.blog)
(TXT) w3m dump (raku-advent.blog)
| eternityforest wrote:
| How about we minimize the UNIX philosophy instead?
| nonameiguess wrote:
| The funny thing about this is the Unix philosophy is just about
| keeping functional units small, separate, and theoretically
| independent of each other. It says nothing about the granularity
| of packaging for end users. Nobody has ever, to my knowledge,
| individually provided each Unix utility in its own package. A GNU
| system has most stuff in coreutils, with most everything else in
| findutils, binutils, and util-linux on Linux systems. Only grep,
| awk, and sed are single-tool packages among the POSIX utilities.
| In BSD systems, one base package contains the entire POSIX
| toolchain.
|
| The idea of having a gigantic "utils" package like this, or even
| a batteries included standard library like Ruby and Python, is
| perfectly in keeping with Unix philosophy. The point is not have
| a single executable that does everything, but you can provide
| many executables and shared objects in one addressable package
| with a common version, a single build, and a monorepo.
| codesections wrote:
| Separating the question of "functional units" from "packaging
| units" is a good point - you're right that there's nothing non-
| Unix-y about packaging coreutils together.
|
| I might add a third category, though, maybe "development
| units"? Something like Python's batteries-included standard
| library strikes me as a bit less Unix-y - not because it
| packages things together but because they (as I understand it)
| _develop_ things together and do so in a way that creates
| barriers to outside packages integrating quite as well as
| standard library packages. (Or at least that 's what I've
| understood from the outside, looking in)
| z3t4 wrote:
| I'll rather use a really small, static (as in never changing)
| package then something bloated that get updates every day and
| breaking changes from time to time. left-pad was not the problem.
| The problem was that NPM changed ownership of already existing
| package-names - which caused the left-pad owner to remove his
| packages in protest.
| codesections wrote:
| > I'll rather use a really small, static (as in never changing)
| package [instead of one] that get updates every day and
| breaking changes from time to time.
|
| That's an entirely fair point and, as I got into a bit in the
| versioning[0] section, something that I'm giving a good deal of
| thought too.
|
| I'm currently leaning towards tracking the Rakudo[1] compiler
| releases (~monthly), so updates wouldn't be anything like
| daily. As far *breaking* changes go - well, again, I'm still
| thinking about/discussing what guarantees to make, but I'm
| hoping to be able to promise to (try to) provide strong
| backwards compatibility. One thing I mentioned in the post is
| that Raku's strong support for multiple dispatch[2] makes
| backwards compatibility a bit easier: `_` can add a new version
| of a function without impacting the existing one.
|
| That still leaves _accidental_ breakage (i.e., bugs) - which is
| the area I'm currently most concerned about. If not handled
| correctly, a utility package risks creating its own sort of
| internal dependency hell: if there's a bug in one sub-package
| that you use, it could potentially block you from using that
| version - even if a different sub-package has a feature you
| want. I'm not sure of the best solution yet, but I'm exploring
| a few Raku options that I _think_ may let me provide versions
| at the sub-package level (or maybe even the function level?).
| That 's very much a WIP for now, but it's something that'll
| happen before a 1.0.0 release.
|
| [0]: https://raku-
| advent.blog/2021/12/11/unix_philosophy_without_...
|
| [1]: https://rakudo.org/
| eternityforest wrote:
| Those large libraries don't change either, if you use a fixed
| version.
|
| Security updates are important, but it's not like CVEs are
| particularly common in 5 functions like left-pad, and bloated
| code that isn't reachable in your app is probably not going to
| be an attack surface.
|
| Especially if a dead code remover gets it.
| dundarious wrote:
| JDNI was "dead code" for almost everybody, in that nobody
| intentionally used or wanted it. Unfortunately it was really
| just "dormant code" waiting to wake up.
|
| There is a dial from small to large (in terms of code size or
| feature set), static or growing feature set, pinned to
| floating dependencies (if you are notified on available
| updates to your pinned version and review them, which is
| actually possible for small dependencies, it's largely
| equivalent to floating).
|
| I don't think anyone is going to be able to convincingly say
| "my choice of large, growing, floating is best", or even
| though it's my starting preference, convincingly say "small,
| static, pinned is best". If you don't have the features or
| performance you need yet, you can make a good choice in
| picking a growing, floating dependency -- your call.
|
| But there is absolutely in my mind a need for greater general
| discipline in dependency capabilities. We don't need a monad
| stack or effects system in order to say "you can't add code
| to remote download, deserialize, and execute, in a call to
| log, that's just not a sensible thing to do". Or maybe we do,
| because much of us still haven't learned this lesson.
| b3morales wrote:
| It doesn't apply universally, but in front end development
| a large part of the problem is that engineering is not, in
| practice, the ultimate decider for (all) dependencies. The
| business/product side often dictates integration with a
| particular third-party service; those services don't have
| an option to _not_ use their own SDK, and the SDK itself
| may come with its own dependencies. To the business side,
| security risk looks abstract, theoretical, and easy to "it
| can't happen to me", especially compared to whatever goal
| is tied to the integration.
|
| Facebook in mobile applications is a perfect example. Not a
| security issue, but the two crash incidents last year
| caused some havoc for iOS developers. But as far as I am
| aware the only way to get Facebook login in your mobile app
| is to use the SDK, and no product manager on the planet is
| going to let engineering talk them out of Facebook user
| support.
| dundarious wrote:
| Sure, people have to use graphics APIs they hate (D12,
| Vulkan, Metal) because there's no realistic way to avoid
| them. If you're forced, you're forced.
|
| Log4j type libraries don't really fall into that camp
| (unless it's an awful transitively forced dependency,
| which unfortunately can happen, but it's at least usually
| slightly easier to fight against/mitigate). And I was
| mainly challenging an implied equivalence between "large,
| growing, pinned dependencies plus dead code elimination",
| and "small, static, pinned/floating dependencies". There
| are plausible trade-offs that could cause you to choose
| any combination of those factors, but I think it's wrong
| if one were to make an equivalence like that.
| HPsquared wrote:
| In this vein, for particularly sensitive applications (e.g.
| password managers) I prefer to disable automatic updates.
| jasonpeacock wrote:
| How do you identify when to perform updates for security
| reasons?
| wincent wrote:
| I think poster is saying turn _off_ "Automatically install
| updates" but leave _on_ "Check for updates
| [Daily/Weekly/Monthly]". That way you at least become aware
| that updates are available, and can assess them for
| yourself.
| masklinn wrote:
| > The problem was that NPM changed ownership of already
| existing package-names - which caused the left-pad owner to
| remove his packages in protest.
|
| And of course that npm allowed unilaterally pulling packages
| and breaking all dependents.
| konschubert wrote:
| Maybe published package versions should be immutable.
|
| I get the malware concerns but in practice I don't think they
| are such a big blocker.
| codesections wrote:
| > Maybe published package versions should be immutable.
|
| They are in many languages. Of those I'm familiar with, Raku,
| Rust, and JavaScript all have immutable package repos. (npm
| wasn't when left-pad was pulled but has changed since then).
|
| Of course, in each case they're only "immutable" in the sense
| that some organization (with varying degrees of
| centralization) has promised to host them forever; people
| clearly vary in their willingness to believe promises of that
| nature.
| simion314 wrote:
| >Maybe published package versions should be immutable.
|
| Still won't help you if the leftpad dev wanted to send a
| message/protest could have put a small update that would do
| something bad.
|
| The problem is when you are not the idiot that installs
| leftpad but you need to install some other package like some
| GUI or testing framework and those "smart" devs decided to
| depend on leftpad directly or indirectly because some stupid
| philosophy. I have inherited a project with such kind of
| idiotic dependencies , inlcuding small stupid shit or
| packages with incorrect package.json that depend on things
| they do not actually depend or things they should not .
| konschubert wrote:
| It would help if you use a lock file and/or pin to badges
| of dependency versions.
| simion314 wrote:
| Yeah, but I never seen people locking the dependencies to
| an exact version, probably to get small fixes and
| important security ones.
|
| But then you still have issues with packages depend on
| npm website existing in future or even some packages are
| just linking to a git repo directly so if the repo is
| gone or giuthub is gone you(or others) can't re-create
| your project.
| codesections wrote:
| > I never seen people locking the dependencies to an
| exact version
|
| This depends heavily on the language/ecosystem. For
| example, golang's Minimal Version Selection[0] basically
| requires libraries to specify an exact version - the only
| way they'd get a higher one is if another library in the
| dependency graph had manually upgraded to the higher
| version.
|
| But yeah, if the source is hosted externally and you
| don't have a local copy somewhere, then that's going to
| hurt. Which is (part of) why "should I vendor my
| dependencies" is such a perennial topic.
|
| [0]: https://research.swtch.com/vgo-mvs
| simion314 wrote:
| >But yeah, if the source is hosted externally and you
| don't have a local copy somewhere, then that's going to
| hurt. Which is (part of) why "should I vendor my
| dependencies" is such a perennial topic.
|
| Is not only this, like what if I create an open source
| thing and share it on github/npm or whatever packages
| website, the best practice is not to bundle my
| dependencies and just list them. Then 5 years later
| someone wants to install my package that depends on their
| package that depends on some leftpad isOdd package that
| now is gone. In other ecosystems it is acceptable as a
| good practice that beside sthe sources you offer an
| .exe,.dll, .jar ,.tar.gz but in node and python community
| I see that the developers only distrbute now with npm,
| pip or similar .
|
| Part of the solution would be to put important core stuff
| in the standard library , then somehow we need to stop
| the CV driven development that causes this fragmentation
| and many alternatives for same thing that you don't get a
| clear answer that should you use for X.
| jancsika wrote:
| > The idea of black box abstraction is that you can implement
| some complex functionality, box it up, and expose it to the
| outside world so carefully that the world can totally ignore the
| implementation details and can care only about the inputs and
| outputs.
|
| Is there such a thing as "glass box" abstractions? :)
| kayodelycaon wrote:
| Ruby has truly ruined me for stuff like this. Most basic
| functionality and some non-trivial functionality is covered in
| the standard library. And if for some reason Ruby doesn't have
| enough Rail's ActiveSupport probably has you covered.
|
| But Ruby is quite famously a batteries included language and its
| libraries follow in that philosophy. Solve the entire problem,
| not tiny pieces of it.
| codesections wrote:
| > Ruby has truly ruined me for stuff like this. Most basic
| functionality and some non-trivial functionality is covered in
| the standard library.
|
| Ruby is one language I haven't had the chance to explore yet.
| Are there any Ruby functions you particularly miss in other
| languages? Any that aren't built in to Raku might be ones I
| consider for the `_` utility library.
| kayodelycaon wrote:
| Code blocks. You don't use a for loop, you call Array#each.
| On top of that, Ruby's Enumerable module allows any object
| with an each method to access a whole bunch of convenience
| methods.
|
| Note, the # means instance method when discussing ruby code.
| It's not valid syntax.
|
| Ruby allows you to be extremely concise, while maintaining
| readability for anyone moderately familiar with the language.
| In a way Perl is most definitely not.
|
| Short example: arr = [<blog posts objects>]
| # Author may be nil number_of_authors =
| arr.map(&:author).compact.uniq.count
|
| You can imagine how easy it is to throw data around with very
| little code. Loops are abstracted, so you never think about
| them as loops. Instead you just see data moving around.
|
| Edit: A few languages (like JavaScript) implement this
| behavior explicitly by passing anonymous functions as
| callbacks.
| codesections wrote:
| I just saw the code you added; here's a pretty literal
| translation into Raku in case you're curious (I'm assuming
| that `<blog posts objects>` is a stand in for omitted code)
| my @arr = [blog_post_objects]; # Author
| may be Nil my $author-count =
| +arr.map(*<author>).grep(*.defined).unique;
|
| (Though note that this would only exclude undefined
| authors. In most situations, I'd probably either know that
| all defined authors are truthy (e.g., an object) or would
| want to exclude the falsy ones as well (e.g., empty
| string). in that case I'd use `.grep(?*)` save a few
| characters.)
| setpatchaddress wrote:
| Not familiar with Raku, but that grep call seems like it
| would be quite a bit slower than the Ruby equivalent at
| runtime if it's anything like a normal grep. Is there
| some magic there that would make that not so?
|
| Edit: clarity
| chrisoverzero wrote:
| "grep" is how raku spells what is elsewhere called
| "filter" or "where".
| codesections wrote:
| Thanks, those all seem useful.
|
| Assuming I'm following you correctly, Raku already has
| equivalents to each of those built in: we have code
| blocks[0], List.map[1] (or `for list -> $el {
| [codeblock]}`[2] which is a `for` loop, but not a C-style
| one), and the Iterable/Iterator Roles[3].
|
| So I don't think those features give me any ideas for items
| to add to the library - but I agree that I'd be sad in a
| language without them!
|
| [0]: https://docs.raku.org/language/control#index-entry-
| control_f...
|
| [1]: https://docs.raku.org/routine/map
|
| [2]: https://docs.raku.org/syntax/for
|
| [3]: https://docs.raku.org/language/iterating
| ajuc wrote:
| Code blocks are just lambdas though? It's pretty much
| mainstream these days, even Java and C++ have them.
| // java equivalent int numberOfAuthors = arr.stream
| ().map(BlogPost::getAuthor).distinct().collect(Collectors.t
| oList()).size();
|
| I happen to think for loops are usually a better choice in
| languages like Java, but the option is there.
|
| EDIT: actually there's a better way, no need to create a
| list just to count its elements: int
| numberOfAuthors = arr.stream().map(BlogPost::getAuthor).dis
| tinct().collect(Collectors.counting());
| setpatchaddress wrote:
| Note how much more concise and yet readable the
| equivalent Ruby is.
| ajuc wrote:
| If you want to play code golf you can do static imports
| and write this: int numberOfAuthors = a
| rr.stream().map(getAuthor).distinct().collect(counting())
| ;
|
| You still save a few parentheses and one method call in
| Ruby (.compact vs .stream() and .collect()) and the
| method names are shorter. Mostly it's a matter of static
| vs dynamic typing and naming conventions, not a
| consequence of Ruby code blocks. And is it worth
| shortening at this point?
|
| BTW I know uniq is a thing in unix, but I hate this
| naming decision.
| burlesona wrote:
| The entire set of string and enumerable methods.
| Extraordinarily useful.
|
| https://ruby-doc.org/core-3.0.3/Enumerable.html
|
| https://ruby-doc.org/core-3.0.3/String.html
| codesections wrote:
| Thanks.
|
| I took a (somewhat quick) look and I'm pretty sure that
| Raku has equivalents for all of the Enumerable methods
| except for partition. (We could do basically the same thing
| with our classify[0] method by converting the Hash to an
| Array. Or do it manually with a reduce. But there are times
| when a partition method would be handy.
|
| [0]: https://docs.raku.org/type/List#routine_classify
|
| The String methods might be offer a few more options, but
| I'll need to think more carefully about that. It's
| idiomatic (and supported with syntax) to use Regexes for at
| least some of those tasks in Raku. Plus, Raku's strings
| aren't directly iterable/indexable (though it's trivial to
| convert them to a list of characters), and Raku doesn't
| have a direct equivalent to a symbol (something that I _do_
| miss). I suspect that, even with those factors, there might
| be some ideas worth stealing in there, so thanks for the
| pointer.
|
| (Oh, and I know that it's "just" syntax, but for some
| reason I *really* like the idea of having an sprintf
| operator. I hadn't thought of a language doing that, but I
| just might borrow that one!)
| Ducki wrote:
| Yup, similar in C#, where the .NET framework has tons of stuff
| already built-in.
| rectang wrote:
| People keep saying "pin versions" as a defense against supply
| chain attacks. That's all well and good until something widely
| used like log4j has a remote-code-execution exploit and then it
| all comes crashing down.
|
| Trusting any single author is a single point of failure --
| eventually the author of one of the packages you depend on will
| get compromised and an attacker will publish a malicious package.
| To combat this, you need package validation by multiple
| independent identities. The classic ways to do this are to have
| multiple people sign a package using PGP, or to rely on vendor
| endorsement -- but the theory behind it is just multi-factor
| authentication.
|
| A second useful step is connect releases to an open source commit
| history. This makes it much more feasible for independent
| authorities to review the differences between release versions as
| a sequence of logical, coherent commits. The ideal is to have
| multiple committers on a project sign a release package, after
| having followed the commit history as it played out.
|
| If a package cannot be connected to an auditable history --
| because a source package is grossly transformed from what's in a
| repo, because there's no public repo, because the history is just
| one big commit or similarly useless, or because a binary package
| is not created using a reproducible build -- then it is harder to
| have confidence in it.
| codesections wrote:
| > Trusting any single author is a single point of failure --
| eventually the author of one of the packages you depend on will
| get compromised and an attacker will publish a malicious
| package.
|
| Thanks, this is exactly the sort of thing I had in mind when
| writing the "Making _ trustworthy"[0] section and exactly the
| sort of conversation I was hoping my post would prompt. One
| benefit I'm hoping to get from keeping the `_` sub-packages as
| simple/self-contained as possible is that that sort of supply-
| chain attack will be easier to spot (e.g., with a 0-dependency
| file, you couldn't use an attack like the event-stream
| incident, where a dependency was swapped out for a malicious
| copy - the malicious code would have to be in the repo itself).
|
| Of course "easier to spot" [?] "won't happen", which is where
| your other point comes in:
|
| > To combat this, you need package validation by multiple
| independent identities. The classic ways to do this are to have
| multiple people sign a package using PGP
|
| Someone else made a similar point in an r/programminglanguages
| comment[1] in response to part 1:
|
| > One thing I'd like to see package managers adapt, though, is
| quorums for publishing. A simple majority quorum of amongst 3+
| people would naturally make hacking much more difficult
|
| Do you happen to know any details about how something like that
| could be put into practice? I agree that it seems like
| something that'd be worth investing in, as an ecosystem and
| would be interested in any info/thoughts other care to share.
|
| [0]: https://raku-
| advent.blog/2021/12/11/unix_philosophy_without_...
|
| [1]:
| https://www.reddit.com/r/ProgrammingLanguages/comments/raau0...
| rectang wrote:
| The machinery for quorums could be built on top of PGP.
| Multiple people can sign a package, and the trustworthiness
| of their endorsements can be evaluated based on a web of
| trust -- including by downstream users, so you don't actually
| have to rely on the robustness of the package manager's
| authentication at the moment of upload.
|
| Because PGP is not universally loved, I think it's important
| to reiterate that the fundamental theory behind quorums is
| just multi-factor auth. But PGP does solve some of the
| hardest parts.
|
| From there it's a matter of defining which authorities to
| trust, and then gating acceptance of a release once a quorum
| is reached (however that quorum is defined).
|
| Finally, the idea needs buy-in and participation from package
| authors, which could be encouraged by privileging releases
| with multiple endorsers.
| codesections wrote:
| Thanks for sharing these ideas. Raku is actually in the
| process of migrating to a new package ecosystem, so this
| could be an ideal time to get something like this set up.
| I'm not sure how much work would be involved from a
| technical standpoint, but I've opened an issue[0] to ask
| the maintainer of our ecosystem package repository;
| hopefully we'll be able to implement a system somewhat
| along these lines.
|
| [0]: https://github.com/tony-o/raku-fez/issues/50
| goodpoint wrote:
| > Trusting any single author is a single point of failure
|
| This is what Linux distributions are for: all big distributions
| have a team of maintainers plus a dedicated security team.
| TacticalCoder wrote:
| > People keep saying "pin versions" as a defense against supply
| chain attacks. That's all well and good until something widely
| used like log4j has a remote-code-execution exploit and then it
| all comes crashing down.
|
| And it doesn't come crashing down for those who didn't pin
| log4j? They're somehow immune to the 0-day?
|
| Or do you mean that the next time they build from scratch
| they'll have their arse saved by a security update they didn't
| even bother tracking?
| rectang wrote:
| In a nutshell, what I favor is automatically accepting
| upstream security releases when those releases can be
| validated by multiple identities. Probabilistically, this
| shortens but does not eliminate the window when you are
| vulnerable.
|
| Unfortunately, as far as I know, typical primary package
| management systems trust the single author/uploader of a
| package and don't provide support for multi-authority
| validation, and so are vulnerable as soon as a single
| author's credentials get compromised. (If that's wrong, and
| npm, PyPI, crates.io, Maven, or anybody else supports multi-
| authority validation, I would love to hear about it.)
|
| I came to these conclusions having been deeply involved with
| release policy at the ASF. (I redrafted the official release
| policy documents in 2015.) The ASF, notably, requires at
| least one PGP signature for every release, but some projects
| have a tradition of multiple signatories -- including the
| Apache HTTPD project, which is where the tradition arose.
| JohnHaugeland wrote:
| The Unix Way is small, replacable single purpose binary tools
| that are vendor blind.
|
| This seems to be the exact opposite.
| codesections wrote:
| This is the followup to Following the Unix philosophy without
| getting left-pad, https://raku-
| advent.blog/2021/12/06/unix_philosophy_without_...
| msie wrote:
| Raku, mentioned in the blog, was formerly Perl 6.
| young_unixer wrote:
| I don't think the Unix philosophy makes too much sense for things
| other than CLI commands, and even there, I'm not 100% convinced.
| anthk wrote:
| Unix desktops are composed of little tools. Under xfce you can
| switch out Thunar and Xfce4-terminal for something else. And
| long ago you could use your own WM in Gnome.
| makapuf wrote:
| Exactly, and integration (launchers, notifications, menu
| integration) can be done around a common ground (a la
| freedesktop.org)
| alpaca128 wrote:
| The general motivation behind it can apply to any kind of
| software, but it's definitely not fitting for many categories.
|
| In desktop applications I usually prefer tools that cover the
| majority of use-cases and provide an easy way to extend them
| (the last part is important too, otherwise you end up with the
| Windows 8 default apps). It doesn't make sense to split up a
| photo editor's core featureset, but that doesn't mean it's a
| good idea to just bury it in a pile of features out of the box
| that only 5% of users care about.
| masklinn wrote:
| Going by the Salus' summary:
|
| - Write programs that do one thing and do it well.
|
| - Write programs to work together.
|
| - Write programs to handle text streams, because that is a
| universal interface.
|
| (3) definitely makes no sense outside of CLI commands, (2) is
| too ill-defined to be of any use (at least outside of programs,
| though even for programs it seems to be quite redundant with
| 3), which leaves (1)... which is basically a matter of taste:
| "one thing" is an extremely ill-defined concept, and you could
| easily argue that most of the useful SUS commands break it,
| especially (though not exclusively) if you look at it from
| GNU's coreutils.
| codesections wrote:
| > (3) ["write programs to handle text streams"] definitely
| makes no sense outside of CLI commands
|
| I'm not sure I'd say that. In the web development world,
| you'll definitely see people arguing to "JSON all the things"
| (text) but others arguing to "protobuff all the things"
| (binary). And they raise many of the same simplicity-vs-
| performance issues that came up for Unix CLI commands.
|
| As for (1) and (2) being too ill-defined/a matter of taste -
| well, I agree, but I don't think that means they're useless.
| I think of them as being in the same category as advice for
| writing prose ("Use short sentences where possible", "Avoid
| cliches"): helpful goals to keep in mind, even if I can't pin
| down exactly what they mean.
| Ar-Curunir wrote:
| For libraries defaulting to an untyped interface makes no
| sense.
| duped wrote:
| If you're using any kind of software that exchanges data in a
| human readable serialization format you're following 3, in
| some form.
|
| Like for example, an HTTP server or client
| masklinn wrote:
| That makes even less sense, the output is dictated by the
| purpose, it's not a choice.
| salawat wrote:
| Computers don't run for computer's sake. They run due to
| extant human utility. That utility is itself
| existentially traceable to a human making a choice.
|
| No computer ever, has done something whose chain of
| causality does not return to a matter of human choice in
| orchestrating the circumstances for said outcome.
| habitue wrote:
| The good parts of the Unix philosophy have already been
| subsumed into common sense as a programmer. So what remains as
| "The Unix philosophy" are the controversial, more religious
| components.
|
| You can tell it's a little religious because, like Agile and
| REST, "everyone is doing it wrong". Where the thing everyone is
| doing wrong is a weird little corner of the thing with dubious
| utility.
___________________________________________________________________
(page generated 2021-12-11 23:00 UTC)