hngopher.com

       [HN Gopher] The xz attack shell script
       ___________________________________________________________________
        
       The xz attack shell script
        
       Author : todsacerdoti
       Score  : 531 points
       Date   : 2024-04-02 09:05 UTC (1 days ago)
        
 (HTM) web link (research.swtch.com)
 (TXT) w3m dump (research.swtch.com)
        
       | mzs wrote:
       | I haven't seen Thomas Roccia's infographic mentioned here yet:
       | https://twitter.com/fr0gger_/status/1774342248437813525
        
         | xomodo wrote:
         | Thx. Timeline shows attack begun by adding ignore entry in
         | .gitignore file. That is hard to detect nowadays.
        
         | moopitydoop wrote:
         | This just gives a partial high-level look at how the exploit
         | gets planted in liblzma, it doesn't cover how the exploit works
         | or its contents at all.
        
         | ihsoy wrote:
         | That seems to be largely grasped at straws and connecting dots
         | without reason.
         | 
         | For example oss-fuzz was building xz by cloning the github repo
         | directly. There was never a chance for oss-fuzz to discover the
         | backdoor, because only the tarball had it, not the repo itself.
         | So that oss-fuzz PR might genuinely just be a genuine thing
         | unrelated to the backdoor.
        
           | sp332 wrote:
           | The ifunc part looked more legitimate, it was used to switch
           | between optimized implementations of CRC. So that part was in
           | the git repo. https://github.com/google/oss-fuzz/pull/10667
        
         | _trampeltier wrote:
         | Jia Tan asked distros to update quickly just before it became
         | public. How possible is it, there is another account / person
         | who learned earlyer from people around Andreas Freund, the
         | backdoor would become public. How possible is it, there is
         | still another insider around?
        
           | OneLeggedCat wrote:
           | Right. Way too much coincidence. Jia Tan found out that it
           | was about to become public and threw a Hail Mary. How did he
           | find out?
        
             | freedomben wrote:
             | If the stakes weren't so high, this would be a damn fun
             | game of murder mystery.
        
             | rsc wrote:
             | I think the RedHat Valgrind report on 2024-03-04 made the
             | Jia Tan team panic, since the one public rwmj stack trace
             | pointed the finger directly at the backdoor. All it would
             | take is someone looking closely at that failure to expose
             | the whole operation. They fixed it on 2024-03-09, but then
             | two weeks later distros still had not updated to the new
             | version, and every day is another day that someone might
             | hit the Valgrind failure and dig. I think that's why the
             | sockpuppets came back on 2024-03-25 begging Debian to
             | update. And then on the Debian thread there was pushback
             | because they weren't the maintainer (except probably they
             | were), so once Debian was updated, Jia Tan had to be the
             | account that asked Ubuntu to update.
        
               | peteradio wrote:
               | That seems like a breach that they went forward with the
               | update based on some random persons request. Oh you're
               | getting pushy? I guess we better listen to this guy.
        
               | rsc wrote:
               | The update was pulling from trusted upstream archives.
               | I'm sure Debian verified that.
        
           | ptx wrote:
           | There were also changes to systemd happening around that time
           | which would have prevented the backdoor from working. See the
           | timeline article by the same author linked in this one.
        
           | strunz wrote:
           | That's probably because of this (as mentioned in the
           | timeline):
           | 
           | >2024-02-29: On GitHub, @teknoraver
           | https://github.com/systemd/systemd/pull/31550 to stop linking
           | liblzma into libsystemd. It appears that this would have
           | defeated the attack. https://doublepulsar.com/inside-the-
           | failed-attempt-to-backdo... that knowing this was on the way
           | may have accelerated the attacker's schedule. It is unclear
           | whether any earlier discussions exist that would have tipped
           | them off.
        
             | supriyo-biswas wrote:
             | rwmj did mention about an unintentional embargo break, so I
             | wonder whether this is GitHub issue is actually it.
        
               | teknoraver wrote:
               | Hi,
               | 
               | I'm the author of such PR. My purpose was to trim down
               | the size of the initram files by removing unneeded
               | dependencies.
               | 
               | I couldn't imagine that liblzma had a backdoor.
        
               | rsc wrote:
               | Hi! Was there any discussion on any mailing lists ahead
               | of time, or was your PR the first public mention of that
               | idea? Thanks.
        
               | teknoraver wrote:
               | Yes, there was an effort to turn all the dependencies
               | into on demand loads, where possible. It started in 2020
               | with libpcre2:
               | https://github.com/systemd/systemd/pull/16260
               | 
               | But many others followed, like libselinux:
               | https://github.com/systemd/systemd/pull/19997
               | 
               | libqrencode:
               | https://github.com/systemd/systemd/pull/16145
               | 
               | p11kit: https://github.com/systemd/systemd/pull/25771
               | 
               | tpm2-util: https://github.com/systemd/systemd/pull/28333
               | 
               | libiptc: https://github.com/systemd/systemd/pull/29836
               | 
               | libkmod: https://github.com/systemd/systemd/pull/31131
               | 
               | Exactly during the development of the libkmod PR, someone
               | noted that libxz could be lazily loaded too: https://gith
               | ub.com/systemd/systemd/pull/31131#issuecomment-1...
               | 
               | And so I proposed myself to to the job, nothing less,
               | nothing more.
               | 
               | If you look at the code of the other PRs, you see that
               | they are very very similar, there are also macros to easy
               | this task, like DLSYM_FUNCTION()
        
               | teknoraver wrote:
               | Hi @rsc, I just saw your update on the timeline.
               | 
               | To be more precise, the first public comment asking to
               | dlopenify lzma was dated 30 Jan by Daan: https://github.c
               | om/systemd/systemd/pull/31131#issuecomment-1...
               | 
               | The day after, it was reiterated by Lennart: https://gith
               | ub.com/systemd/systemd/pull/31131#issuecomment-1...
               | 
               | But if you look in the systemd repo there is a TODO file
               | with a section of libraries which needs to be lazy
               | loaded. liblzma was added in this list in June 2020 (http
               | s://github.com/systemd/systemd/commit/cdfd853744ee934869.
               | ..) by Lennart, and removed by me just after that my PR
               | was merged.
        
               | rsc wrote:
               | Updated. Thanks!
        
       | benlivengood wrote:
       | The only upside of finding this attack (aside from preventing it
       | from being rolled out more widely) is that it gives a public
       | example of a complex APT supply-chain attack. Rest assured there
       | are more out there, and the patterns used will probably be
       | repeated elsewhere and easier to spot.
       | 
       | Obfuscated autoconf changes, multi-stage deployment, binary blobs
       | (hello lib/firmware and friends, various other encoders/decoders,
       | boot logos, etc), repo ownership changes, new-ish prolific
       | committers in understaffed dependency libraries, magic values
       | introduced without explanation.
        
         | cassianoleal wrote:
         | > a complex APT supply-chain attack
         | 
         | What do you mean by APT? If you mean Debian's package manager,
         | that's not what this attack was. This was done upstream and
         | affected non-apt distros just as much.
         | 
         | It's true that upstream is part of apt's supply chain but
         | focussing on apt is misleading.
         | 
         | edit: why the downvotes? I get from the responses that I was
         | wrong but given how the exploit was initially found in a Debian
         | system and a lot of people very quickly jumped on the "Debian
         | patched a thing and broke security" bandwagon, I don't think it
         | was much of a leap to wonder if that's what was meant.
         | 
         | Acronyms and initialisms are not the best way to convey
         | specific information.
        
           | mttpgn wrote:
           | Not apt the package manager-- it's an acronym for Advanced
           | Persistent Threat
        
           | supriyo-biswas wrote:
           | > What do you mean by APT
           | 
           | Advanced, persistent threat.
        
           | haimez wrote:
           | It stands for "Advanced Persistent Threat" -
           | https://en.m.wikipedia.org/wiki/Advanced_persistent_threat
        
           | danieldk wrote:
           | I think Advanced Persistent Threat.
        
           | nequo wrote:
           | Others have answered your question about APT but FWIW I don't
           | understand the downvotes. You were respectful and simply
           | sought to clear up a misunderstanding.
        
           | macintux wrote:
           | I was heavily downvoted for supplying a link to a previous
           | discussion on the same topic recently. I wouldn't worry much
           | about it.
        
         | MuffinFlavored wrote:
         | > The changes to build-to-host.m4 weren't in the source repo,
         | so there was no commit. > > The attacker had permissions to
         | create GitHub releases, so they simply added it to the GitHub
         | release tarball.
         | 
         | What are some simple tweaks the "Debians" of the world can do
         | to mitigate this kind of stuff?
         | 
         | Not trust "hand-curated-by-possibly-malicious-maintainers"
         | GitHub release tarballs, only trust git commits?
         | 
         | Whitelist functions that are allowed to do IFUNC/ELF hooking to
         | core OpenSSH functions?
        
           | ilc wrote:
           | Read IOCCC entries.
           | 
           | Now realize, those are people having FUN. What is your chance
           | of catching nation state level maliciousness in a codebase?
           | Pretty low.
        
           | heeen2 wrote:
           | remove test blobs before building
        
             | JonChesterfield wrote:
             | What's a test binary? Bunch of bytes on disk. What's a
             | source file? Bunch of bytes on disk.
        
               | varjag wrote:
               | Good people and bad people are all made of atoms so
               | there's no difference.
        
               | evilos wrote:
               | I think the point is that you can't tell a good person
               | from a bad one by inspecting the atoms.
        
         | MarkSweep wrote:
         | Add to that list suppression of warnings from valgrind and
         | address sanitizer without any justification. And no tracking
         | issue to follow up on fixing the problem so the suppression can
         | be removed.
         | 
         | Committing binary files to source control rather than including
         | build commands to generate the files is a bit of a red flag.
        
           | saagarjha wrote:
           | They're test cases for a compression library. Seems pretty
           | reasonable.
        
             | hyperhopper wrote:
             | Not at all. This would not pass a good code review. The
             | test was for good stream, bad stream, good stream. The two
             | good streams were only a few bites, why was the bad stream
             | so large?
             | 
             | A good reviewer should have, even for a binary test case,
             | asked the submitter to simplify it to the smallest or most
             | basic binary required for the functionality.
        
               | saagarjha wrote:
               | idk picking a real-world example that breaks the code
               | seems like a good test case to me
        
               | benlivengood wrote:
               | It is a good example for an initial bug report but, once
               | the code has been fixed to handle that failure case,
               | minimal examples are the correct lasting tests to live
               | with the code forever.
               | 
               | Additionally, a complex example may require multiple
               | conditions to fail and if those aren't split into
               | multiple tests then subtle bugs can be reintroduced later
               | because that complex test doesn't cover all potential
               | failure conditions. If there need to be test cases for
               | multiple related bugs then they need to be minimal to
               | demonstrate the failure condition combinations they are
               | testing for.
        
               | Dunedan wrote:
               | Adding a script to regenerate good and bad binary files
               | might have worked for this use case pretty well too.
        
         | fsflover wrote:
         | > hello lib/firmware
         | 
         | How could it play any role here? It doesn't bring any
         | dependencies, does it?
        
       | sanxiyn wrote:
       | > The first difference is that the script makes sure (very sure!)
       | to exit if not being run on Linux.
       | 
       | The repeated check is indeed mysterious. My only hypothesis is
       | that the attacker may have thought that it should look plausible
       | as a test input to a compression library, hence repetition.
        
         | TheBlight wrote:
         | Is it though? The attacker probably has Linux x86 target(s) in
         | mind and IFUNC support isn't guaranteed to work with other
         | platforms.
        
           | sanxiyn wrote:
           | Checking for Linux makes sense. Doing the exact same check
           | for Linux five times in a row is mysterious.
        
             | tamimio wrote:
             | It would be an interesting plot twist if the whole thing
             | was an AI hallucination.
        
               | neffy wrote:
               | ...or a stalking horse by somebody in Microsoft's
               | marketing division.
        
               | ddalex wrote:
               | How about an AI trying to make itself some spare CPU
               | cycles available ?
        
             | dboreham wrote:
             | Perhaps expected a very fast machine that might blast
             | straight through the first few checks. Like how you needed
             | two STOP statements in a Cray Fortran program in case it
             | blew through the first one at 80MIPS.
        
           | ycombinatrix wrote:
           | why doesn't IFUNC work on Linux ARM64?
        
             | bodyfour wrote:
             | IFUNC is supported on several architectures, including
             | ARM64.
             | 
             | The malicious code that the xz backdoor inserts into the
             | library is a compiled x86_64 object file so it only is
             | targeting one platform.
        
         | mercurialuser wrote:
         | It can be to make space for script changes: you may overwrite
         | the first bytes of the script.
         | 
         | Or just add some laziness.
        
         | fsniper wrote:
         | Can it be to enlarge/or obfuscate parts of the compressed test
         | file? Perhaps without the repetitions the compressed file has
         | some strange binary triggering some security or antivirus
         | software?
        
         | m3kw9 wrote:
         | Maybe if run on non Linux it could be found out either by
         | crashing or leaving some trace because of OS differences
        
         | Hakkin wrote:
         | I also thought it was odd. There's also different random bytes
         | (not random text, actual random bytes) prefixed to the start
         | the scripts, but the bytes are prefixed by a hash symbol, which
         | comments them out, so they don't affect the script. It seems
         | intentional, but I can't think of why they would be there. I
         | thought maybe xz would skip compression if the input was
         | short/not complex enough or something, so they were added to
         | pad the size, but removing them and re-compressing with xz
         | seems to properly compress it, none of the original plaintext
         | is in the compressed archive bytes.
         | 
         | One thing I noticed while trying to reproduce the exact bytes
         | included in the .xz file committed to git is that the script's
         | xz stream doesn't seem to be compressed by any of the default
         | xz presets, I was only able to reproduce it by using `xz
         | --lzma2=dict=65536 -c stream_2`. All the default numbered
         | presents chose a different dictionary size. Another odd
         | seemingly intentional choice, but again I don't understand the
         | reasoning.
        
           | Hakkin wrote:
           | Ah, I think I understand the random bytes at the start of the
           | script now. They're prepended to make the partial output of
           | the "corrupted" (pre-tr'd) test stream look like random data.
           | Without those random bytes, you will see part of the start of
           | the script if you observe the partially decompressed output
           | before xz throws an error. They really thought quite
           | carefully about hiding this well.
           | 
           | Still not sure about the repeated lines, though now I'm
           | convinced there must be some reason for it.
        
       | xyst wrote:
       | Whoever hired these people to infiltrate this project spent a
       | shit ton of hours building it in such a way to avoid detection
       | this long. Fortunately, it was so complicated that they couldn't
       | account for all of the factors.
       | 
       | This is why open source will always outperform closed source in
       | terms of security. Sure it pointed a massive flaw in the supply
       | chain. Sure it highlights how under appreciated the foundational
       | elements of FOSS, leaving maintainers subject to manipulation.
       | 
       | But the same attack within a private company? Shit, probably
       | wouldn't even need advanced obfuscation. With a large enough PR
       | and looming deadlines, could easily sneak something like this
       | with a minimal amount of effort into production systems. By the
       | time company even realizes what happens, you are already flying
       | off to a non-extradition country and selling the exfiltrated data
       | on Tor (or dark web).
        
         | skrtskrt wrote:
         | off topic, but how many actual non-extradition countries are
         | there these days (for the US)?
         | 
         | Even countries we have strained relationships with will
         | extradite as part of a negotiation when it's convenient for
         | them politically.
         | 
         | Russia probably wouldn't have even kept Snowden if it wasn't
         | state secrets he revealed. If it was just some random data
         | breach they would have prisoner-swapped him for an oligarch
         | caught money laundering elsewhere.
        
           | moopitydoop wrote:
           | Our adversaries aren't going to extradite their citizens to
           | the west. And obviously if it's a state-level actor they
           | aren't going to extradite their own actors.
           | 
           | As someone old enough to remember the tail end of the early
           | hacker eras (e.g. Mitnick), I don't think anyone SHOULD be
           | extradited over this, in particular if they're not being
           | charged with actually using the exploit. Prosecute them where
           | they live. Should they be prosecuted 193 times over in every
           | state on Earth? What's the nexus? Github? Every server that
           | installed the compromised xz utils?
           | 
           | But you are right they will _deport_ (not extradite)
           | foreigners who are inconvenient to them or when it is
           | politically expedient to do so, if the foreigners are a
           | nuisance, or as part of a political negotiation or prisoner
           | exchange.
           | 
           | The whole "extradition treaties" meme is a misconception. You
           | will only get extradited if you flee to a country where you
           | are a citizen (even dual citizen), or the ability to assert
           | citizenship/nationality there. A fugitive fleeing to a
           | country without an extradition treaty is subject to
           | deportation. Every country on earth reserves the right to
           | deny entry to or deport foreign fugitives. They might choose
           | not to if someone is found to be a refugee, subject to the
           | death penalty in a non-death-penalty state, etc.
        
           | EasyMark wrote:
           | Russia and China would never extradite for this, they would
           | hire the people involved if they aren't already in their
           | employ, and I wouldn't blame them. I'm not even sure if they
           | could be charged with more than misdemeanor charges anyway.
        
           | cesarb wrote:
           | > off topic, but how many actual non-extradition countries
           | are there these days (for the US)?
           | 
           | Several countries do not extradite their own citizens. For
           | citizens of these countries, going back to their own home
           | country would be enough.
        
         | bsza wrote:
         | I'm gonna cry survivorship bias here. How do we know how many
         | similar attempts succeeded? How many of the ones discovered
         | have been written off as honest mistakes? How can we know for
         | sure that e.g. Heartbleed wasn't put there by someone on
         | purpose (and that that someone isn't filthy rich now)?
         | 
         | When you get hired to a private company, they know who you are.
         | That's an immediate deterrent against trying anything funny. On
         | Github, no one knows who you are. It might be harder to
         | backdoor a project without getting noticed, but there is no
         | risk to getting noticed. You can try as many times as you like.
         | Jia Tan is still at large, and didn't even have to plan their
         | whole life around living in a non-extraditing country (if they
         | aren't in one already).
        
           | xyst wrote:
           | https://en.wikipedia.org/wiki/Industrial_espionage (aka
           | corporate espionage)
           | 
           | Happens all the time. Maybe it's a state actor. Maybe it's a
           | disgruntled employee. It's just not in the same lens as you
           | expect (software supply chain attack).
           | 
           | Apple has trouble keeping the lid on top secret projects.
           | Leaks about designs happen all the time prior to scheduled
           | debut at WWDC.
           | 
           | MS has had trouble in the past as well when it came to
           | developing the Xbox (One?).
           | 
           | "Owners of China-Based Company Charged With Conspiracy to
           | Send Trade Secrets Belonging to Leading U.S.-Based Electric
           | Vehicle Company" - https://www.justice.gov/usao-
           | edny/pr/owners-china-based-comp...
           | 
           | "Ex-Google engineer charged with stealing AI trade secrets
           | while working with Chinese companies" -
           | https://www.latimes.com/world-nation/story/2024-03-07/ex-
           | goo...
           | 
           | "The US Hits Huawei With New Charges of Trade Secret Theft" -
           | https://www.wired.com/story/us-hits-huawei-new-charges-
           | trade...
           | 
           | "U.S. charges China-controlled company in trade secrets
           | theft" - https://www.pbs.org/newshour/economy/u-s-charges-
           | china-contr...
           | 
           | "Ex-Google and Uber engineer Anthony Levandowski charged with
           | trade secret theft" -
           | https://www.theverge.com/2019/8/27/20835368/google-uber-
           | engi...
           | 
           | In the case of Levandowski, the dude didn't even bother with
           | covering his tracks. Straight up just downloads trade secrets
           | from source control and transfers them to personal computer -
           | https://www.justice.gov/usao-ndca/press-
           | release/file/1197991...
           | 
           | In this small sample of cases, were the exfiltration attempts
           | as elaborate as the "xz attack"? Probably not, but all of
           | these people were vetted by internal procedures and that did
           | nothing to stop them from acting maliciously.
           | 
           | Forget back dooring the project when getting through the
           | front door is so much easier! People are very relaxed in
           | their walled off garden and cubicle :)
        
             | resolutebat wrote:
             | Stealing trade secrets (read: copying source code) is a
             | whole different ballgame from trying to inject backdoors
             | into ubiquitous pieces of software.
        
             | bsza wrote:
             | This kind of demonstrates my point. Every single one of
             | these headlines indicates the bad actor has been "charged"
             | (with serious consequences in the case of Huawei).
             | 
             | Has Jia Tan been "charged" with anything?
        
               | nindalf wrote:
               | I'm gonna cry survivorship bias here. How do we know how
               | many similar attempts succeeded?
        
               | bsza wrote:
               | Right, we don't know that, but we have yet to hear of one
               | case where a supply chain attack on a FOSS project
               | resulted in someone getting arrested.
        
         | yongjik wrote:
         | I don't buy it. Most private companies would require a face-to-
         | face meeting when you start. Even if you're fully remote, the
         | expectation would be that at some point you would meet your
         | coworkers at meetspace, most likely before you get to commit
         | anything substantial. The ones that are worth penetrating will
         | almost certainly require a background check.
         | 
         | And then, once you're in, you cannot just commit to your target
         | project willy-nilly, as your manager and your manager's manager
         | will have other priorities. A for-profit company's frequently
         | dysfunctional management would actually work as deterrent here:
         | you don't just need to justify your code, you will have to
         | justify why you were working on it in the first place.
        
           | ThePowerOfFuet wrote:
           | https://rigor-
           | mortis.nmrc.org/@simplenomad/11218486968142017...
        
           | xyst wrote:
           | face to face, background checks. they are all superficial.
           | 
           | A smooth talker can get you to relax your guard.
           | 
           | Identity can be faked, especially if you have a nation state
           | backing you.
        
             | breadwinner wrote:
             | Today no one knows what Jia Tan looks like. That means Jia
             | Tan come back as Gary Smith and work on the next exploit.
             | If we knew what he looked like, we could have at least
             | prevented him from coming back.
        
               | not2b wrote:
               | Jia Tan might be more than one person.
        
               | breadwinner wrote:
               | Requiring him/her to show their face might have prevented
               | that.
        
               | saagarjha wrote:
               | Why? Just have one person show up and everyone else
               | funnel the work through their account.
        
               | kelseydh wrote:
               | On open source projects you can vet people based on
               | whether you can personally identify them as real people.
               | Moving forward be suspicious of anonymity for core
               | contributions.
        
               | not2b wrote:
               | Debian does things that way, their developers have to get
               | their key signed by other developers, and a DD who signs
               | someone's key is supposed to check ID. But there's no
               | similar protection for their upstream.
        
               | TeMPOraL wrote:
               | That's still one named person the police can go after, a
               | good starting point for investigation into the criminal
               | network.
               | 
               | Also, that's still one named person that no one would
               | like to end up being, so that alone acts as a deterrent.
        
             | bigiain wrote:
             | How much would you bet against the NSA having a team full
             | of leetcode and interview experts, who's job is to apply at
             | tech companies and perform excellently through the hiring
             | process, so that the offensive team at NSA can infiltrate
             | and work remotely without ever needing to meet there new
             | "coworkers"?
             | 
             | I suspect a "professional job seeker" with the resources of
             | the NSA behind them and who lands 1st and subsequent
             | interviews dozens of times a year - would be _way_ better
             | at landing interviews and jumping through stupid recruiting
             | hoops that even the best senior or "10x engineers", who
             | probably only interview a dozen or two times in their
             | entire career.
        
         | EasyMark wrote:
         | Could be interesting as a class project to investigate other
         | small relatively innocuous but near ubiquitous projects that
         | could be treated the same way and investigate whether something
         | similar could be done or has been done already. Just making a
         | list of them would be useful if nothing else.
        
       | bandrami wrote:
       | As a longtime autotools-hater I would say this justifies my
       | position, but any build system complex enough to be multiplatform
       | is going to be inscrutable enough to let somebody slip something
       | in like this. But it really is a problem that so much software is
       | built with what's essentially become a giant cargo cult-style
       | shell script whose pieces no one person understands all of.
        
         | echoangle wrote:
         | I think the common usage of bash and other languages with a
         | dense and complicated syntax is the root problem here. If build
         | scripts were written in python, you would have a hard time
         | obfuscating backdoors, because everyone would see that it is
         | weird code. Bash code is just assumed to be normal when you
         | can't read it.
        
           | kimixa wrote:
           | I think the issue is that build systems are often build on a
           | completely different language - python with some weird build
           | framework is likely as inscrutable as bash and autotools to
           | someone who doesn't use python.
           | 
           | You _can_ write pretty straightforward readable bash, just as
           | I 'm sure you can write pretty gnarly python. Especially if
           | you're intentionally trying to obfuscate.
        
             | echoangle wrote:
             | The problem is that obfuscated bash is considered normal.
             | If unreadable bash was not allowed to be committed, it
             | would be much harder to hide stuff like this. But
             | unreadable bash code is not suspicious, because it is kind
             | of expected. That's the main problem in my opinion.
        
               | kimixa wrote:
               | Lots of autogenerated code appears "obfuscated" -
               | certainly less clear than if a programmer would have
               | written it directly.
               | 
               | But all this relies on one specific thing about the
               | autotools ecosystem - that shipping the generated code is
               | considered normal.
               | 
               | I know of no other build system that does this? It feels
               | weird, like shipping cmake-generated makefiles instead of
               | just generating them yourself, or something like scons or
               | meson being packaged _with_ the tarball instead of
               | requiring an eternal installation.
               | 
               | That's a _lot_ of extra code to review, before you even
               | get to any kind of language differences.
        
               | eru wrote:
               | > Lots of autogenerated code appears "obfuscated" -
               | certainly less clear than if a programmer would have
               | written it directly.
               | 
               | That's why you don't commit auto-generated code. You
               | commit the generating code, and review that.
               | 
               | Same reason we don't stick compiled binaries in our
               | repositories. Binary executables are just auto-generated
               | machine code.
        
               | owlbite wrote:
               | I think the main issue is auto tools tries to support so
               | many different shells/versions all with their own
               | foibles, so the resulting cross-compatible code looks
               | obfuscated to a modern user.
               | 
               | Something built on python won't cover quite as wide a
               | range of (obsolete?) hardware.
        
               | eru wrote:
               | Python actually covers quite a lot of hardware. Of
               | course, it does that via an autotools nightmare generated
               | configure script.
               | 
               | Of course, you could do the detection logic with some
               | autotools-like shenanigans, but then crunch the data (ie
               | run the logic) on a different computer that can run
               | reasonable software.
               | 
               | The detection should all be very small self-contained
               | short pieces of script, that might be gnarly, but only
               | produce something like a boolean or other small amount of
               | data each and don't interact (and that would be enforced
               | by some means, like containers or whatever).
               | 
               | The logic to tie everything together can be more
               | complicated and can have interactions, but should be
               | written in a sane language in a sane style.
        
               | freedomben wrote:
               | This probably varies widely, because unreadable bash is
               | absolutely _not_ considered normal, nor would pass code
               | review in any of my projects.
               | 
               | On a slightly different note, unless the application is
               | written in python, it grosses me out to think of writing
               | scripts in python. IMHO, if the script is more complex
               | that what bash is good at (my general rule of thumb is do
               | you need a data structure like an array or hash? then
               | don't use bash), then use the same language that the
               | application is written in. It really grosses me out to
               | think of a rails application with scripts written in
               | python. Same with most languages/platforms.
        
               | echoangle wrote:
               | What if your application is written in Rust or C? Would
               | you write your build scripts in these languages, too? I
               | would much prefer a simpler scripting language for this.
               | If you're already using a scripting language as the main
               | language, you don't necessarily need to pull in another
               | language just for scripts, of course.
        
               | ycombinatrix wrote:
               | build.rs is a thing FYI
        
               | dgsb wrote:
               | or make.go, for some project it makes sense to not add
               | another language for scripting and building tasks. It way
               | easier for every one to have to master multiple language.
        
               | eru wrote:
               | Writing a build script in Rust is fine-ish.
               | 
               | Writing anything in C is a bad idea these days, and
               | requires active justification that only applies in some
               | situations. Essentially, almost no new projects should be
               | done in C.
               | 
               | Re-doing your build system, or writing a build system for
               | a new project, counts as something new, so should
               | probably not be done in C.
               | 
               | In general, I don't think your build (or build system)
               | should necessarily be specified in the same language as
               | most of the rest of your system.
               | 
               | However I can see that if most of your system is written
               | in language X, then you are pretty much guaranteed to
               | have people who are good at X amongst your developers, so
               | there's some natural incentive to use X for the tooling,
               | too.
               | 
               | In any case, I would mostly just advice against coding
               | anything complicated in shell scripts, and to stay away
               | from Make and autotools, too.
               | 
               | There are lots of modern build systems like Shake, Ninja,
               | Bazel, etc that you can pick from. They are all have
               | their pros and cons, just like the different distributed
               | version control systems have their pros and cons; but
               | they are better than autotools and bash and Make, just
               | like almost any distributed version control is better
               | than CVS and SVN etc.
        
               | northzen wrote:
               | Also zig is a good example of not a scripting language
               | which does this job.
        
               | freedomben wrote:
               | C is probably the best example where I _would_ be fine
               | with scripts in Python (for utility scripts, not build
               | scripts). Though, if it were me I 'd use Ruby instead as
               | I like that language a lot better, and it has Rake (a
               | ruby-ish version of Make), but that's a "flavor of ice
               | cream" kind of choice.
        
               | salawat wrote:
               | ...The main problem is some asshat trying to install a
               | backdoor.
               | 
               | I use bash habitually, and every time I have an
               | inscrutable or non-intuitive command, I pair it with a
               | comment explaining what it does. No exceptions.
               | 
               | I also don't clean up after scripts for debuggability. I
               | will offer an invocation to do the cleanup though after
               | you've ascertained everything worked. Blaming this on
               | bash is like a smith blaming a hammer failing on a
               | carpenter's shoddy haft... Not terribly convincing.
               | 
               | There was a lot of intentionally obfuscatory measures at
               | play here and tons of weaponization of most conscientious
               | developer's adherence to the principle of least
               | astonishment, violations of homoglyphy (using easy to
               | mistake filenames and mixed conventions), degenerative
               | tool invocations (using sed as cat), excessive
               | use/nesting of tools (awk script for the RC4 decryptor),
               | the tr, and, to crown it all, _malicious use of test
               | data!!!_
               | 
               |  _As a tester, nothing makes me angrier!_
               | 
               | A pox upon them, and may their treachery be returned upon
               | them 7-fold!
        
               | eru wrote:
               | > Blaming this on bash is like a smith blaming a hammer
               | failing on a carpenter's shoddy haft... Not terribly
               | convincing.
               | 
               | If your hammer is repurposed shoe, it's fair to blame the
               | tools.
        
               | Terr_ wrote:
               | > pair it with a comment
               | 
               | A good practice, but not really a defense against malice,
               | because if the expression is inscrutable enough to really
               | need a comment, then it's also inscrutable enough that
               | many people won't notice that the comment is a lie.
        
               | salawat wrote:
               | Nothing short of reading the damn code is a defense
               | against malice. I have yet to have any luck in getting
               | people to actually do that.
        
             | giantrobot wrote:
             | We'll write build scripts in highly obfuscated Perl. There,
             | now no one is happy.
        
             | sanderjd wrote:
             | Man, this _sounds_ right, but I dunno ... I feel like even
             | "simple" shell scripts tend toward more inscrutability than
             | all but the most questionable python code I've ever
             | written. Just comparing my dotfiles - every bit of which I
             | understood when I wrote them - to the gnarliest sections of
             | a big python app I work on ... I just really feel like at
             | least _some_ of this inscrutability issue can truly be laid
             | at the feet of shell  / bash as a language.
        
               | clnhlzmn wrote:
               | This is totally true and the bash apologists are
               | delusional.
        
               | lobocinza wrote:
               | It depends on the use case. Bash code can be elegant,
               | Python code can be ugly. I'm not saying those are the
               | average cases but complex code regardless of the language
               | often is ugly even with effort to make it more readable.
        
               | TeMPOraL wrote:
               | So are some Python advocates, too. The thing that's worse
               | than a bash script made of Perlish line noise, is a piece
               | of "clean code" dead simple Python that's 80%
               | __language__.boilerplate, 20% logic, smeared over 10x the
               | lines because small functions calling small functions are
               | cool. No one has enough working memory to keep track of
               | what's going on there. Instead, your eyes glaze over it
               | and you convince yourself you understand what's going on.
               | 
               | Also, Python build scripts can be living hell too, full
               | of dancing devils that could be introducing backdoors
               | left and right - just look at your average Conan recipe,
               | particularly for larger/more sensitive libraries, like
               | OpenSSL or libcurl.
        
               | _a_a_a_ wrote:
               | That's a lot of emotive language, can you actually link
               | to some actual examples.
        
               | medstrom wrote:
               | I can understand the frustration when those small
               | functions are not well-named, so you have to inspect what
               | they do.
        
               | TeMPOraL wrote:
               | You already have to inspect everything if you want to
               | review/audit a build script. Small functions - and I
               | specifically mean functions being written small because
               | of misguided ideas of "clean code", as opposed to e.g.
               | useful abstraction or reusability - become especially
               | painful there, as you have that much more code to read,
               | and things that go together logically (or execution-wise)
               | are now smeared around the file.
               | 
               | And you can't really name such small functions well
               | anyway, not when they're broken down for the sake of
               | being small. Case in point, some build script I saw this
               | week had function like `rename_foo_dll_unit_tests`
               | calling `rename_foo_dll_in_folder` calling
               | `rename_foo_dll` calling `rename_dlls`, a distinct call
               | chain of four non-reused functions that should've been at
               | most two functions.
               | 
               | Are all Python build scripts like that? Not really. It's
               | just a style I've seen repeatedly. The same is the case
               | with inscrutable Bash scripts. I think it speaks more
               | about common practices than the language itself
               | (notwithstanding Bash not really being meant for writing
               | longer programs).
        
               | medstrom wrote:
               | Sounds like DRY run amok indeed. Maybe a compiler or
               | linter could detect these cases and complain "this
               | function is only called from one place" :)
        
               | TeMPOraL wrote:
               | I mentioned Conan recipes, didn't I? :). Those are my
               | most recent sources of frustration.
        
               | _a_a_a_ wrote:
               | I've never heard of a conan language, and a couple of
               | URLs to some bad recipes would not go amiss.
        
               | TeMPOraL wrote:
               | Conan is a package manager for C/C++, written in Python.
               | See: https://conan.io/.
               | 
               | The way it works is that you can provide "recipes", which
               | are Python scripts, that automate the process of
               | collecting source code (usually from a remote Git
               | repository, or a remote source tarball), patching it,
               | making its dependencies and transitive dependencies
               | available, building for specific platform and
               | architecture (via any number of build systems), then
               | packaging up and serving binaries. There's a _lot_ of
               | complexity involved.
               | 
               | Here are the two recipes I mentioned:
               | 
               | libcurl: https://github.com/conan-io/conan-center-
               | index/blob/master/r...
               | 
               | OpenSSL v3: https://github.com/conan-io/conan-center-
               | index/blob/master/r...
               | 
               | Now, for the sake of this thread I want to highlight
               | three things here:
               | 
               | - Conan recipes are usually made by people unaffiliated
               | with the libraries they're packaging;
               | 
               | - The recipes are fully Turing-complete, do a lot of
               | work, have their own bugs - therefore they should really
               | be treated as software comonents themselves, for the
               | purpose of OSS clearing/supply chain verification, except
               | as far as I know, nobody does it;
               | 
               | - The recipes can, and do, patch source code and build
               | scripts. There's supporting infrastruture for this built
               | into Conan, and of course one can also do it by brute-
               | force search and replace. See e.g. ZLib recipe that _does
               | it both at the same time_ :
               | 
               | https://github.com/conan-io/conan-center-
               | index/blob/7b0ac710... -- `_patch_sources` does _both_
               | direct search-and-replace in source files, _and_ applies
               | the patches from https://github.com/conan-io/conan-
               | center-index/tree/master/r....
               | 
               | Good luck keeping track of what exact code goes into your
               | program, when using Turing-complete "recipe" programs
               | fetched from the Internet, which fetch your libraries
               | from _somewhere else_ on the Internet.
        
               | _a_a_a_ wrote:
               | That was a really, really good answer, thanks.
        
               | sanderjd wrote:
               | FWIW, my comment wasn't meant to single out python as
               | particularly good. I think the comparison I drew between
               | its inscrutability and that of shell / bash would apply
               | to nearly all other languages as well.
        
               | egorfine wrote:
               | I'm a bash apologist and I totally stand by your words.
               | It's delusional. Bash totally has to go.
        
               | dboreham wrote:
               | Shell is evil mainly because it's so old. It's largely
               | unchanged since 1978 and had to run on incredibly
               | resource limited machines. What it could achieve on those
               | machines was amazing. Tradeoffs would be different today,
               | e.g. PowerShell.
        
               | SlightlyLeftPad wrote:
               | A big pet peeve of mine is that shell is written off as
               | evil but the only reason I ever hear is basically a
               | variation of "I don't understand it, therefore it scares
               | me." The reality is, unlike _really_ old languages like
               | COBOL or RPG, bash is still literally everywhere, it's
               | installed on practically every linux machine by default
               | which makes deploying a shell script completely trivial.
               | It's certainly under appreciated and because it's
               | ubiquitous, widely used in build processes, there's a
               | responsibility to learn it. It's not hard, it's not a
               | wildly complex language.
               | 
               | I don't think these issues would necessarily be solved at
               | all by waving a hand and replacing it with similarly
               | complex build tools. Bazel, for example, can be a
               | daunting tool to fully grasp. Any tool used should be
               | well understood. Easier said than done of course.
        
               | Espressosaurus wrote:
               | If a language has more footguns than guns-pointed-at-not-
               | my-feet, I consider that a problem with the language.
               | 
               | And shell has a lot of footguns.
        
               | SlightlyLeftPad wrote:
               | I'd argue that every language is loaded full of foot
               | guns. If you're encountering those foot guns on a regular
               | basis, it's an issue with the author.
               | 
               | That said, what can help drastically here are well-
               | defined best practices and conventions built into the
               | language which, admittedly, bash really doesn't have.
        
               | _a_a_a_ wrote:
               | Can you point to 3 python foot guns, for instance?
        
               | s1dev wrote:
               | I'm generally in agreement with the anti-bash camp, but I
               | can name about that many :)
               | 
               | - Mutating default arguments to functions, so subsequent
               | calls have different behavior
               | 
               | - Somewhat particular rules around creating references vs
               | copies
               | 
               | - Things that look like lambda captures but aren't quite
        
               | _a_a_a_ wrote:
               | Touche!
        
               | sanderjd wrote:
               | Yep, every language has footguns and other kinds of
               | quirks, but I contend that the "footguns per character"
               | ratio in shell is unusually high. (It is not unique in
               | having a high ratio though; other popular languages, like
               | c++ for instance, also have this issue.)
        
               | eitland wrote:
               | The worst (level of nastyness * usage) offenders all
               | probably have a reason for being popular despite their
               | flaws:
               | 
               | - Bash: installed everywhere you want to work (yes, who
               | actually _wants_ to work on Windows ;-)
               | 
               | - C/C++: when speed/size matters there was no alternative
               | except Assembly until recently
               | 
               | - Javascript: until recently this was the most sane
               | option for client side code on the web (Active X and Java
               | applets existed yes but managed to be even worse.)
               | 
               | - PHP: Low cost hosting, Function-As-A-Service way before
               | that became popular, shared nothing architecture,
               | _instant_ reload for local development bliss
        
               | sanderjd wrote:
               | Couldn't agree more.
        
               | bonzini wrote:
               | - string vs. list "in" ('a' in 'a' is True, but 'a' in
               | ['a'] is also True)
               | 
               | - cannot know which object attributes are private or
               | public (and some classes use settable properties so you
               | can't say "just don't set any attributes on non-dataclass
               | objects")
        
               | eitland wrote:
               | Agree.
               | 
               | As long as you apply the same standards to what seems to
               | be everyones darling: Javascript.
               | 
               | Javascript has the same amount of footguns as PHP and
               | Bash but has gotten away with it by being cute (and
               | having a whole menagerie if support tools around it to
               | make it possible for ordinary people to write workable
               | code in it).
               | 
               | (Yes, I am qualified to rant about Javascript BTW. I
               | wrote a working map rendering system with pan and zoom
               | and automatic panning based on GPS location using
               | ECMAScript and SVG back in the spring of 2005. I think
               | roughly half a year before Google Maps became public.
               | Back before _all_ the modern JS tooling existed. All I
               | had was JEdit with syntax highlighting. Perl at least let
               | me put breakpoints in my code even back then.
               | 
               | And yes, I have written more JS since then.)
        
               | Espressosaurus wrote:
               | I specifically avoided web development because of
               | Javascript and its plethora and ever-changing set of
               | frameworks. So IMO, it absolutely counts.
        
               | eitland wrote:
               | FWIW it has actually become better the last few years:
               | 
               | Now you _can_ at least just stick to React and TypeScript
               | and bundle it using Webpack and have months of relative
               | sanity between each time you have to throw something out
               | and replace it.
        
               | sanderjd wrote:
               | I think a good way to evaluate languages from this
               | perspective is through the lens of how easy it is to
               | maintain understanding over time. I have learned bash
               | well three or four times now, but I know now that I'm
               | never going to remember enough of its quirks through the
               | interim periods where I'm not focused on it, to be able
               | to grok arbitrary scripts without refreshing my memory.
               | This is very different for languages like java, go,
               | python, and some others, which have their quirks, sure,
               | but a much lower quirks per character ratio.
               | 
               | I might agree with "it's not hard to learn it", but I
               | don't agree with "it's not hard to remember it".
        
               | bilekas wrote:
               | > Shell is evil mainly because it's so old.
               | 
               | I really don't understand this point, its a script
               | language, how old is it doesn't make any difference. I've
               | come accross some Powershell scripts that were unreadable
               | down to its verbosity with certain things, and if you
               | don't already know all the flags and options for it, it's
               | hopeless to try and understand.
               | 
               | Both serve a purpose, neither are 'evil'.
        
               | mkesper wrote:
               | I don't think PowerShell is a big improvement, though.
               | Still allows no signature checking of functions. Shells
               | are optimized for fast entry, not for writing
               | comprehensible (or even secure) programs.
        
               | dheera wrote:
               | There is Xonsh which is a Python shell. I don't know why
               | everyone hasn't switched to it already as default.
        
               | bayindirh wrote:
               | > I feel like even "simple" shell scripts tend toward
               | more inscrutability than all but the most questionable
               | python code I've ever written.
               | 
               | When you use long options in bash (scripts), it becomes
               | very readable, but it's not a widespread practice. I
               | _always_ use long options while writing scripts.
               | 
               | Consider these two examples, which is very
               | straightforward:
               | 
               | - curl -fsSL $URL | bash
               | 
               | - curl --fail --silent --show-error --location $URL |
               | bash
               | 
               | The second one almost "talks you through".
        
               | sanderjd wrote:
               | IMO, that's not where the inscrutability comes from.
               | Rather, it is things like many useful (and thus used)
               | constructs being implemented with line noise like
               | `"$(foo)"`, `2>&1 &`, `${foo:-bar}`, etc., finicky
               | control flow syntax, surprising defaults that can be
               | changed within unclear scopes, etc.
        
               | bayindirh wrote:
               | You're right, but you can also expand these. If not
               | syntactically, by adding temporary variables and comments
               | around these complex parts.
               | 
               | It's true that Bash and Perl has one of the most
               | contractible syntax around, but it's not impossible to
               | make it more understandable.
               | 
               | However, these parts of codebases are considered
               | "supportive" and treated as second class citizens, and
               | never receives the same love core parts of the codebases
               | enjoy. That's a big mistake IMO.
               | 
               | When you make something more readable all around, hiding
               | things becomes harder exponentially.
        
             | deanishe wrote:
             | Python makes it relatively hard to write inscrutable code,
             | and more importantly, it's very non-idiomatic, and there
             | would be pushback.
             | 
             | WTFing at shell scripts is normal.
        
               | egorfine wrote:
               | > WTFing at shell scripts is normal
               | 
               | "WTFing". This is brilliant.
        
               | Terr_ wrote:
               | I always think of this now-old comic:
               | https://www.osnews.com/story/19266/wtfsm/
        
             | Etheryte wrote:
             | This feels a bit like saying you _can_ run as fast as Usain
             | Bolt. Theoretically many things are possible, but I don 't
             | think I've ever seen a readable bash script beyond a
             | trivial oneliner and I've seen a lot of bash in my life. Or
             | to maybe explain from a different perspective, ask a room
             | full of developers to write a bash if-else without looking
             | at the docs and you'll probably come back with more
             | different options than developers in the room. Ask the same
             | for a language such as Python and you'll mostly get one
             | thing.
        
           | sunshowers wrote:
           | It's rather easy to monkeypatch Python into doing spooky
           | things. For something like this you really want a language
           | that can't be monkeypatched, like I think Starlark.
        
             | echoangle wrote:
             | Where are you going to hide your monkey patching though? As
             | long as your code is public, stuff like this is always
             | going to stand out, because no one writes weird magic one
             | liners in python.
        
               | sunshowers wrote:
               | I'm not a security-oriented professional, but to me a
               | place I could hide this logic is by secretly evaling the
               | contents of some file (like the "corrupt archive" used in
               | xz) somewhere in the build process, hiding it behind a
               | decorator or similar.
        
               | echoangle wrote:
               | I'm not a security professional either, but that doesn't
               | sound very plausible to me. If you assume a maintainer
               | who checks every commit added to the codebase, he's
               | hopefully blocking you the second he sees an eval call in
               | your build script. And even a code audit should find
               | weird stuff like that, if the code is pythonic and simple
               | to read. And if it's not, it should not be trusted and
               | should be treated as malicious.
        
               | sunshowers wrote:
               | Well, the threat model here is that a maintainer
               | themselves is the saboteur.
        
               | cjbprime wrote:
               | That was true for this project, which was almost orphaned
               | to begin with. We'll run out of nearly-unmaintained
               | critical infrastructure projects sometime. Larger
               | projects with healthier maintenance situations are also
               | at risk, and it's worth reasoning about how a group of
               | honest developers could discover the actions of one
               | malicious developer (with perhaps a malicious reviewer
               | involved too).
        
               | lijok wrote:
               | Would stick out like a sore thumb
        
               | cyanydeez wrote:
               | This code was essentially monkey patched from a test
               | script. Python automatically runs any code in a imported
               | module, so not hard to see a chain of module imports that
               | progressively modifies and deploys a similar structure.
        
               | zmmmmm wrote:
               | The way python's run time literally executes code when
               | you import a module makes it seem pretty easy to taint
               | things from afar. You only need to control a single
               | import anywhere in the dependency hierarchy and you can
               | reach over and override any code somewhere else.
        
               | sunshowers wrote:
               | Oh yeah, that's a fantastic point.
        
               | eru wrote:
               | There are lints that will warn you, if your imported
               | module does anything apart from define functions and
               | classes.
               | 
               | (Though not sure how fool-proof these are.)
        
               | DrFalkyn wrote:
               | eval() is a big security hole
        
           | matheusmoreira wrote:
           | Any language can become a turing tarpit if you try hard
           | enough.
        
             | eru wrote:
             | Some languages make you try harder, some less so.
             | 
             | And not all languages are Turing complete in the first
             | place. Not even all useful languages.
        
           | heavyset_go wrote:
           | The things that Bash is good at can wind up obfuscated in
           | Python code due to the verbosity and complexity that it
           | translates to in Python.
           | 
           | Bash is great at dealing with files, text, running other
           | programs, job handling, parallelism and IPC.
           | 
           | Those things in combination can end up being more complex in
           | Python, which creates more opportunities for obfuscation.
        
           | vmfunction wrote:
           | Cliffy is also pretty good: https://cliffy.io
           | 
           | And type safe.
        
           | DarkNova6 wrote:
           | Yet it is python code which has the highest amount of
           | security vulnerabilities found in public repos. And the most
           | often times that pre-compiled code is commited as well.
        
           | unhammer wrote:
           | > In the beginning of Unix, m4 was created. This has made
           | many people very angry and has been widely regarded as a bad
           | move.
           | 
           | autoconf creates a shell script by preprocessing with m4. So
           | you need to know not just the intricacies of shell scripting,
           | but also of m4, with its arcane rules for escaping:
           | https://mbreen.com/m4.html#quotes
           | 
           | If autoconf used m4 to generate python scripts, they would
           | also look like https://pyobfusc.com/#winners
        
           | AtlasBarfed wrote:
           | Just wait for AI to start pumping spaghetti code everywhere.
           | 
           | That's AI phase one
           | 
           | Hey, I phase 2 is even better disguised exploit code hiding
           | behind acres of seemingly plausible AI generated code
        
         | JonChesterfield wrote:
         | Right there with you. It's really tempting to blame this entire
         | thing existing on m4 but that's the trauma talking.
        
         | rgmerk wrote:
         | Yeah.
         | 
         | I haven't used it for some time but autoconf always seemed like
         | a horrible hack that was impossible to debug if it didn't work
         | properly.
         | 
         | That was bad enough back in the days where one was mostly
         | concerned with accidents, but in more modern times things that
         | are impossible to debug are such tempting targets for mischief.
        
         | shp0ngle wrote:
         | Every time someone explains to me autotools, the individual
         | pieces sort of make sense, yet the result is always this
         | inscrutable unreadable mess.
         | 
         | I don't know why.
        
         | cyanydeez wrote:
         | Just the fact that you have multiple platforms suggests few
         | people will fully understand the entire complex.
        
         | jongjong wrote:
         | The devil is in the detail and nothing obscures details like
         | complexity.
         | 
         | Same reason why I don't like TypeScript in its current form.
         | It's not worth the extra complexity it brings.
        
         | dpkirchner wrote:
         | Agreed, and I think this leads to the question: how much risk
         | do we face because we want to support such a wide range of
         | platforms that a complex system is required?
         | 
         | And how did we get to the point that a complex system is
         | required to build a compression library -- something that
         | doesn't really have to do much more than math and memory
         | allocation?
        
           | cesarb wrote:
           | > And how did we get to the point that a complex system is
           | required to build a compression library -- something that
           | doesn't really have to do much more than math and memory
           | allocation?
           | 
           | The project in question _contained_ a compression library,
           | but was not limited to it; it also contained a set of command
           | line tools (the  "xz" command and several others).
           | 
           | And a modern compression library needs more than just "math
           | and memory allocation"; it also needs _threads_ (to make use
           | of all the available cores), which is historically not
           | portable. You need to detect whether threads are available,
           | and which threading library should be used (pthreads is not
           | always the available option). And not only that, a modern
           | compression library often needs hand-optimized assembly code,
           | with several variants depending on the exact CPU type, the
           | correct one possibly being known only at runtime (and it was
           | exactly in the code to select the correct variant for the
           | current CPU that this backdoor was hidden).
           | 
           | And that's before considering that this is a _library_.
           | Building a dynamic library is something which has a lot of
           | variation between operating systems. You have Windows with
           | its DLLs, MacOS with its frameworks, modern Linux with its
           | ELF stuff, and historically it was even worse (like old
           | a.out-based Linux with its manually pre-allocated base
           | address for every dynamic library in the whole system).
           | 
           | So yeah, if you restrict yourself to modern Linux and perhaps
           | a couple of the BSDs, and require the correct CPU type to be
           | selected at compilation time, you could get away with just a
           | couple of pages of simple Makefile declarations. But once you
           | start porting to a more diverse set of systems, you'll see it
           | get more and more complicated. Add cross-compilation to the
           | mix (a non-trivial amount of autotools complexity is there to
           | make cross-compilation work well) and it gets even more
           | complicated.
        
         | jonhohle wrote:
         | The issue was that built artifacts weren't immutable during the
         | test phase and/or the test phase wasn't sandboxed from the
         | built artifacts.
         | 
         | The last build system I worked on separated build and test as
         | separate stages. That meant you got a lot of useless artifacts
         | pushed to a development namespace on the distribution server,
         | but it also meant later stages only needed read access to that
         | server.
        
           | edflsafoiewq wrote:
           | The malicious .o is extracted from data in binary test files,
           | but the backdoor is inserted entirely in the build phase.
           | Running any test phase is not necessary.
        
             | rafaelmn wrote:
             | So if you ran build without test files it would fail. I get
             | that this is hindsight thinking - but maybe removing all
             | non essential files when packaging/building libraries
             | reduces the surface area.
        
         | salawat wrote:
         | Let me reword that for you:
         | 
         | >No one has any business saying they know what something does
         | until they've actually read it.
         | 
         | Beneath the placid surface of abstraction is the den of the
         | devil.
        
         | kjellsbells wrote:
         | I'm glad to see I'm not a minority of one here. Looking at
         | autoconf and friends I'm reminded of the dreck that used to
         | fill OpenSSL's code simply because they were trying to account
         | for every eventuality. Autotools feels like the same thing. You
         | end up with a ton of hard to read code (autogenerated bash, not
         | exactly poetry) and that feels very inimical to safety.
        
         | livrem wrote:
         | I see what everyone is saying about autotools, and I never
         | envied those that maintained the config scripts, but as an end-
         | user I miss the days when installing almost any software was as
         | simple as ./configure && make && make install.
        
           | humanrebar wrote:
           | Running that command is simple, sure. The development
           | workflows when that doesn't work is full stop horrible.
           | 
           | Also the basic workflows for the alternative build systems
           | have maybe ten more characters to type. It's not bad.
        
         | frankohn wrote:
         | I do not agree with your generalisation, the Meson build system
         | is well thought, it has a clear declarative syntax that let you
         | just express what you want in a direct way.
         | 
         | The designer of Meson explicitly avoided making the language
         | turing complete so for example you cannot define functions. In
         | my experience this was an excellent decision to limit people
         | tendency to write complex stuff and put the pressure on the
         | Meson developer to implement themselves all the useful
         | functionalities.
         | 
         | In my experience the Meson configuration are as simple as they
         | can be and accommodate only a modicum of complexity to describe
         | OS specific options or advanced compiler option one may need.
         | 
         | Please note that some projects' Meson file have been made
         | complex because of the goal to match whatever the configure
         | script was doing. I had in mind the crazy habits of autotools
         | to check if the system has any possibly used function because
         | some system may not have it.
        
           | humanrebar wrote:
           | Meson is way better than autotools, but if you're too level
           | to depend on python, you're probably needing to customize
           | your build scripts in the ways you mention. I don't see meson
           | being a silver bullet there.
           | 
           | Also, meson's build dependencies (muon, python) are a lot for
           | some of these projects.
        
             | pknopf wrote:
             | If a projects build system can't depend on python, that
             | let's leave them in the dust, ffs..
        
           | Brian_K_White wrote:
           | This just results in 500x worse build.sh on top of
           | meson/ninja/whatever.
        
         | humanrebar wrote:
         | Autotools need to go away, but most of the underlying use cases
         | that cause build system complexity comes down to dependency
         | management and detection. The utter lack of best practices for
         | those workflows is the root cause of complexity. There is no
         | way to have a mybuild.toml build config on the face of those
         | challenges.
        
       | cletus wrote:
       | My big takeaway is that modern build systems are fundamentally
       | broken. Autotools, even Makefiles, are (or can be) incredibly
       | obtuse, even unreadable. If I read this attack correctly, it
       | relied on a payload in a test file (to obfuscate it). It
       | shouldn't be possible to include test resources in a production
       | build.
       | 
       | As an aside, C/C++'s header system with conditional inclusion is
       | also fundamentally broken. Even templates are just text
       | substitution with a thin veneer of typing.
       | 
       | I think about Google's build system, which is very much designed
       | to avoid this kind of thing. The internal build tool is Blaze
       | (Bazel is the open-source cousin). Many years ago, you could
       | essentially write scripts in your BUILD files (called genrules)
       | that were hugely problematic. There was no way to guarantee the
       | output so they had to be constantly rebuilt. There was a long
       | project to eliminate this kind of edge case.
       | 
       | Blaze (and Bazel) are built around declaring hermetic units to be
       | built with explicit dependencies only. Nothing shipped to
       | production is built locally. It's all built by the build servers
       | (a system called Forge). These outputs are packaged into Midas
       | packages ("MPMs"). You could absolutely reconstruct the source
       | used to build a particular library, binary or package as well as
       | the build toolchain and version used. And any build is completely
       | deterministic and verifiable.
       | 
       | C/C++, Make, CMake, autotools, autoconf and all that tooling so
       | common in Linux and its core dependencies absolutely needs to go.
        
         | __MatrixMan__ wrote:
         | If I were starting something from scratch I'd do Bazel for
         | internal deps, Nix for external ones. If that's a tall order,
         | so be it: we train people. Investments in determinism usually
         | end up paying off.
        
           | threePointFive wrote:
           | I'm not familiar with Bazel, but Nix in it's current form
           | wouldn't have solved this attack. First of all, the standard
           | mkDerivation function calls the same configure; make; make
           | install process that made this attack possible. Nixpkgs
           | regularly pulls in external resources (fetchUrl and friends)
           | that are equally vulnerable to a poisoned release tarball.
           | Checkout the comment on the current xz entry in nixpkgs https
           | ://github.com/NixOS/nixpkgs/blob/master/pkgs/tools/comp...
        
           | wocram wrote:
           | This sounds nice on paper, but there's a lot of missing glue
           | to be written between nix and bazel.
           | 
           | Ideally nix would move towards less imperative/genrule style
           | package declarations and ultimately become more usable for
           | internal builds.
        
         | sanderjd wrote:
         | > _modern build systems_ > _Autotools, even Makefiles_ > _C
         | /C++'s header system with conditional inclusion_
         | 
         | Wouldn't it be more accurate to say something like "older build
         | systems"? I don't think any of the things you listed are
         | "modern". Which isn't a criticism of their legacy! They have
         | been very useful for a long time, and that's to be applauded.
         | But they have _huge_ problems, which is a big part of why newer
         | systems have been created.
         | 
         | FWIW, I have been using pants[0] (v2) for a little under a
         | year. We chose it after also evaluating it and bazel (but not
         | nix, for better or worse). I think it's really really great!
         | Also painful in some ways (as is inevitably the case with any
         | software). And of course it's nearly impossible to _entirely_
         | stomp out  "genrules" use cases. But it's much easier to get
         | much closer to true hermeticity, and I'm a big fan of that.
         | 
         | 0: https://www.pantsbuild.org/
        
           | justinpombrio wrote:
           | To be clear about the history: Make is from 1976, and
           | Autotools is from 1991. Had me a chuckle about these being
           | called modern, they're (some of?) the earliest build systems.
           | 
           | https://en.wikipedia.org/wiki/Make_(software)
           | 
           | https://en.wikipedia.org/wiki/Autoconf
        
         | ants_everywhere wrote:
         | > The internal build tool is Blaze (Bazel is the open-source
         | cousin).
         | 
         | I was under the impression that Blaze is Google's Bazel
         | deployment, i.e. that they're the same code. Is that not
         | correct?
        
           | xen0 wrote:
           | Mostly correct; there's some code that's Blaze only and some
           | that's Bazel only, but the core is the same.
        
         | dhx wrote:
         | Ugly and broken build systems are most of the reason why
         | Gentoo's 'sandbox' feature and equivalent for other
         | distributions exists.[1] These sandboxing features have mostly
         | been used in the past to prevent an ugly shell script in the
         | build process for libuselesscruft from doing something similar
         | to "rm -rf" on the build system. More recently these sandboxing
         | features are helpful in encouraging reproducible builds by
         | alerting maintainers to build processes which try and obtain
         | non-deterministic information from the operating system
         | environment such as the operating system name and version, host
         | name and current time stamp.
         | 
         | There are a few gaps I think xz-utils highlights:
         | 
         | - Repositories containing a mixture of source code, build
         | scripts, test frameworks, static resources and documentation
         | generation scripts are all considered to be a single security
         | domain with no isolation between them. If you look to Gentoo's
         | src_prepare function as an example, we perhaps should instead
         | split this into build_src_prepare, doc_src_prepare,
         | test_src_prepare and install_src_prepare instead. If source
         | code is being built and binaries generated, the sandboxed build
         | directory should perhaps not contain test files and
         | documentation generation scripts. If the package is being
         | installed with "make install" (or equivalent) then static
         | resources (such as a timezone database) should be available to
         | copy to /usr/, but build scripts used to generate the binaries
         | or documentation do not need to be available to "make install"
         | (or equivalent).
         | 
         | - Sandboxing used for package building hasn't traditionally
         | been implemented for security reasons in the past. Sandboxing
         | should perhaps be hardened further with modern and more complex
         | approaches such as seccomp to further protect build systems
         | from the likes of libbackdoored that are targeted towards
         | package maintainers/Linux distribution build systems. As a
         | further example to seccomp, Gentoo's 'sandbox' has Linux
         | namespace isolation built in, but not yet enabled whilst it is
         | tested.
         | 
         | - There is a lack of automated package management tools
         | (including dashboards / automatic bug creation) for comparing
         | source trees in Git to released tarballs and making
         | discrepancies more visible and easier for maintainers to
         | review.
         | 
         | - There is a lack of automatic package management tools
         | (including dashboards / automatic bug creation) for detecting
         | binary and high entropy files in source trees and confirming
         | they are validly formatted (e.g. invalid tag in a TLV file
         | format) and confirming that test and example files contain
         | nothing-up-my-sleeve content.
         | 
         | There has already been an accelerated move in recent years
         | towards modern and safer build systems (such as meson and
         | cargo) as 80's/90's C libraries get ripped out and replaced
         | with modern Rust libraries, or other safer options. This is a
         | lot of work that will take many years though, and many old
         | 80's/90's C libraries and build systems will be needed for many
         | more years to come. And for this reason,
         | sandboxing/isolation/safety of old build systems seemingly
         | needs to be improved as a priority, noting that old build
         | systems will take years or decades to replace.
         | 
         | [1] https://devmanual.gentoo.org/general-
         | concepts/sandbox/index....
         | 
         | [2] https://devmanual.gentoo.org/ebuild-
         | writing/functions/src_pr...
        
         | ufmace wrote:
         | You're not wrong, but IMO that isn't the real problem. The real
         | problem is the combination of highly popular core open-source
         | utilities with near-zero maintenance resources.
         | 
         | This was all possible because XZ, despite being effectively
         | everywhere, has one actual part-time maintainer who had other
         | things going on in his life. That also means that there's
         | nowhere near enough resources to redo the build system to some
         | more secure alternative. If they had, say, 10 enthusiastic and
         | skilled volunteer contributors with plenty of free time, they
         | could do that, but then a new person appearing out of nowhere
         | with a few helpful commits would never have a chance at being
         | made an official maintainer or sneaking sketchy tools and code
         | past the other maintainers.
         | 
         | Not that I'm blaming XZ or the real maintainer. Clearly whoever
         | was behind this was looking for the weakest link, and if it
         | wasn't XZ, it would have been something else. The real problem
         | is the culture.
         | 
         | So I guess what this really means is someone at a big
         | corporation making Linux distros should audit their full
         | dependency tree. Any tool in that tree that isn't actively
         | maintained, say at least 3 long-time active contributors, they
         | should take over one way or another - whether that's hiring the
         | current maintainer as a full-time remote employee, offering to
         | buy out the rights, or forking and running under their own
         | team.
         | 
         | I'm not necessarily super thrilled with that, but I guess it's
         | the world we live in now.
        
           | bhawks wrote:
           | The overworked solo maintainer is a problem.
           | 
           | However I've seen way too many projects where individuals in
           | the 'team' are able to carve out impenetrable fiefdoms where
           | they can operate with wide latitude.
           | 
           | I could see a Jia Tan being able to pull this off in a team
           | context as well - bigger teams might even be weaker.
           | (Everyone welcome Jia - he's going to write test cases and
           | optimize our build process so everyone can focus on
           | $LAUNCH_DAY)
        
           | pch00 wrote:
           | > So I guess what this really means is someone at a big
           | corporation making Linux distros should audit their full
           | dependency tree.
           | 
           | This is it precisely. When you're paying Redhat for an
           | "enterprise" Linux then that guarantee should extend down
           | their entire software stack. Just getting the odd backported
           | patch and so-so email support no longer cuts it.
        
           | peteradio wrote:
           | Why do utilities need constant fucking updates? Why do they
           | need maintenance? New features are entirely optional and if
           | what you want to do isn't supported aka xz java, uhh, get
           | fucked or do it yourself.
        
             | Dunedan wrote:
             | I believe software of a certain complexity can't be
             | finished and will always need updates, even if you exclude
             | new features, as the whole eco system of hardware and
             | software is constantly evolving around it. Here are a few
             | reasons why:
             | 
             | - bug fixes (as every non-trivial software has bugs)
             | 
             | - improved security (as the kernel adds security
             | functionality (capability dropping, sandboxing, ...),
             | software can utilize this functionality to reduce its
             | attack surface)
             | 
             | - improvements of existing features (e.g. utilizing new CPU
             | extensions or new algorithms for improved performance)
        
         | anthk wrote:
         | Makefiles are not a Linux thing, but Unix.
         | 
         | And if you confuse C with C++ and Make with
         | CMake/Autotools/Autoconf, you have a lot to learn. Look: simple
         | and portable makefiles work:
         | 
         | git://bitreich.org/english_knight
        
         | peteradio wrote:
         | Google gets to decide what platform they want to support for
         | their backend. I wonder what their build code would look like
         | if they tried to support the top-10 common OS from the last 30
         | years.
        
       | zadwang wrote:
       | The set of unix utilities have been tested a long time. I just
       | wish the kernel and key utilities keeps fixed and not changed.
       | Unless absolutely necessary. Don't fix it if it ain't broken. The
       | software empire seems out of control.
        
         | JonChesterfield wrote:
         | It is all broken. The quick hack to make things kind of work
         | today, compounded across fifty years and a million developers.
         | 
         | There are occasional shining lights of stuff done right but I
         | think it's fair to say they're comprehensively outcompeted.
        
       | Solvency wrote:
       | It's kind of tragically amusing how heinously complex and
       | unnecessarily inscrutable modern technology is, and it's only
       | getting worse. I think developers sadistically enjoy it.
        
         | Trufa wrote:
         | This is probably the worst take ever.
         | 
         | It's pretty amazing how the tools keep up with the increasing
         | complexity of the products we make.
         | 
         | And to be honest, in most cases just make it simpler, I think
         | people just don't like to learn new stuff.
        
       | mittermayr wrote:
       | As a developer, this amazes me, and it just shows what -- to me
       | -- feels like a top-tier attack method, is probably only entry-
       | to-mid level complexity for the folks working at that stage. Some
       | of the things I see posted here on HN are well above this level,
       | so I'd assume for the right kind of money (or other incentives),
       | this is only the beginning of what's possible. And, if you think
       | of ALL the packages and ALL the many millions of libraries on
       | GitHub, this vector is SO EFFECTIVE, there will be hundreds of
       | cases like it uncovered in the next few months, I am certain of
       | it.
       | 
       | I worry about all the con/pro-sumer hardware makers, from Philips
       | Hue to Alexas, from the SumUps to the camera makers, from Netgear
       | to TP-Link. All their products are packed with open-source
       | libraries. And I am 100% certain that most of their dev teams do
       | not spend time scanning these for obscure injection vectors.
        
         | jimkoen wrote:
         | > And I am 100% certain that most of their dev teams do not
         | spend time scanning these for obscure injection vectors.
         | 
         | This rationale baffles me, it feels that the dependency-hell
         | circlejerk crowd is working on making OSS maintainers look even
         | more bad with this scenario.
         | 
         | Any given commercial operation that claims any credibility for
         | itself does supply chain analysis before adopting a dependency.
         | This is, among other things why ordinarily you'd pay RedHat to
         | maintain a stable Linux Release for you and why projects such
         | as FreeBSD severely limit the software they ship in the default
         | install.
         | 
         | If you are affected by this mess, I'm sorry to say, but it's
         | your fault. If you are worried about developers of software you
         | use for free, as in free beer, going rogue, either put in
         | incentives for them to not do that (i.e. pay them) or fork the
         | project and implement your own security measures on top of
         | what's already there.
         | 
         | If you're worried that you could encounter exploits from
         | dependencies in commercial software you use, you should
         | negotiate a contract that includes compensation from damages
         | from supply chain attacks.
         | 
         | If you're unwilling to do that, sorry mate, you're just
         | unprofessional.
         | 
         | Inb4: Yes, I am really trying to say that you should check the
         | supply chain of even your most basic dependencies such as SSH.
        
           | finaard wrote:
           | Unfortunately that's "industry standard" nowadays. I lost
           | count how often I had that discussion over the last two
           | decades.
           | 
           | Just look at stuff like pip, npm or pretty much any "modern"
           | package manager in use by developers - they're all pretty
           | much designed to pull in a shitload of arbitrary unaudited
           | and in some causes unauditable dependencies.
           | 
           | And nobody wants to listen. That's why I prefer to work in
           | heavily regulated areas nowadays - that way I can shorten
           | that discussion with "yeah, but regulatory requirements don't
           | let us do that, sorry"
           | 
           | The absolute basic should be having a local archive of
           | dependencies which at least received a basic sanity check,
           | and updates or additions to that should review changes being
           | added. CI gets access to that cache, but by itself does not
           | have network access to make sure no random crap gets pulled
           | into the build. You'd be surprised how many popular build
           | systems can't do that at all, or only with a lot of
           | workarounds.
        
             | pknopf wrote:
             | Package managers that use git are less prone to this kinda
             | of attack (goland, rust).
        
               | steveklabnik wrote:
               | Cargo does not "use git."
        
           | Dunedan wrote:
           | > Any given commercial operation that claims any credibility
           | for itself does supply chain analysis before adopting a
           | dependency. This is, among other things why ordinarily you'd
           | pay RedHat to maintain a stable Linux Release for you and why
           | projects such as FreeBSD severely limit the software they
           | ship in the default install.
           | 
           | That sounds like you assume RedHat would've caught the
           | vulnerability in xz-utils, before shipping it in the next
           | release of RHEL. I'm not so sure about that, as there is only
           | so much you can do in terms of supply chain analysis and such
           | a sophisticated vulnerability can be pretty hard to spot.
           | Also mind that it only got discovered by accident after all.
        
             | tcmart14 wrote:
             | I don't know if RedHat would have caught it. But the
             | benefit of Red Hat is, they would be the one to fall on the
             | sword. Your product is built on RHEL. This happens. You get
             | to shift blame to RHEL and RedHat would eat it. The
             | positive is, after the dust has settled Red Hat could
             | choose to sort of adopt the compromised piece (invest
             | engineering effort and take it over) or take some
             | stewardship (keeping an eye on it and maybe give a hand to
             | whoever is maintaining it after).
        
           | rlpb wrote:
           | I think it's more than an individual or an organisation. The
           | industry as a whole has favoured not caring about
           | dependencies or where they come from in order to increase
           | velocity. Concerns about supply chain (before we even had the
           | term) were dismissed as "unlikely and we won't be blamed
           | because everyone's doing it").
           | 
           | The organisations that did have some measures were complained
           | about loudly, and they diluted their requirements over time
           | in order to avoid stagnation. Example: Debian used to have a
           | "key must be signed by three other Debian developers"
           | requirement. They had to relax the requirement in part
           | because, from the perspective of the wider ecosystem, nobody
           | else had these onerous requirements and so they seemed
           | unreasonable (although Covid was the final straw). If we'd
           | had an ecosystem-wide culture of "know your upstream
           | maintainer", then this kind of expectation as a condition of
           | maintainership would be normal, we'd have much better tooling
           | to do it, and such requirements would not have seemed onerous
           | to anyone. It's like there's an Overton Window of what is
           | acceptable, that has perhaps shifted too far in favour of
           | velocity and at the cost of security, and this kind of
           | incident is needed to get people to sit up and take notice.
           | 
           | This incident provides the ecosystem as a whole the
           | opportunity to consider slowing down in order to improve
           | supply chain security. There's no silver bullet, but there
           | are a variety of measures available to mitigate, such as
           | trying to know the real world identity of maintainers, more
           | cautious code review, banning practices such as binary blobs
           | in source trees, better tooling to roll back, etc. All of
           | these require slowing down velocity in some way. Change can
           | only realistically happen by shifting the Overton Window
           | across the ecosystem as a whole, with everyone accepting the
           | hit to velocity. I think that an individual or organisation
           | within the ecosystem isn't really in a position to stray too
           | far from this Overton Window without becoming ineffective,
           | because of the way that ecosystem elements all depend on each
           | other.
           | 
           | > If you're unwilling to do that, sorry mate, you're just
           | unprofessional.
           | 
           | There are no professionals doing what you suggest today,
           | because if they did, they'd be out-competed on price
           | immediately. It's too expensive and customers do not care.
        
         | bipson wrote:
         | That's why I don't see e.g. TP-Link basing their router
         | firmware on OpenWRT as a win, and why I want the "vanilla"
         | upstream project (or something that tracks upstream by design)
         | running on my devices.
         | 
         | Applies to all of my devices btw. I don't like Android having
         | to use an old kernel, I didn't like MacOS running some ancient
         | Darwin/BSD thing, etc. The required effort for backporting
         | worries me.
         | 
         | Don't get me wrong, I'm not saying OSS has no vulns.
        
           | doubled112 wrote:
           | More orgs directly contributing to upstream is best in my
           | eyes too. I'm not against forking, but there are usually real
           | benefits to running the latest version of the most popular
           | one.
           | 
           | One opposite of this I've seen is Mikrotik's RouterOS. I'm
           | under the understanding that they usually reimplement
           | software and protocols rather than depending on an upstream.
           | 
           | I'd imagine that is what leads to issues such as missing UDP
           | support in OpenVPN for 10 years, and I'm not sure it gives me
           | the warmest fuzzy feeling about security. Pros and cons, I
           | suppose. More secure because it's not the same target as
           | everybody else. Less secure because there are fewer users and
           | eyes looking at this thing.
        
         | 0xdeadbeefbabe wrote:
         | > there will be hundreds of cases like it uncovered in the next
         | few months, I am certain of it.
         | 
         | "Given the activity over several weeks, the committer is either
         | directly involved or there was some quite severe compromise of
         | their system. Unfortunately the latter looks like the less
         | likely explanation, given they communicated on various lists
         | about the "fixes" mentioned above."
         | (https://www.openwall.com/lists/oss-security/2024/03/29/4)
         | 
         | So, it's like story of those security researchers injecting
         | bugs into the kernel
         | https://thehackernews.com/2021/04/minnesota-university-apolo...
         | 
         | I'm saying this isn't that easy to pull off, and it's unlikely
         | we'll see hundreds of similar cases.
        
       | kelseydh wrote:
       | The lesson I take away from this incident is that we probably
       | shouldn't be allowing anonymity for core contributers in critical
       | open source projects. This attack worked and the attacker will
       | likely get away with it free of consequence, because they were
       | anonymous.
        
         | hk__2 wrote:
         | > The lesson I take away from this incident is that we probably
         | shouldn't be allowing anonymity for core contributers in
         | critical open source projects. This attack worked and the
         | attacker will likely get away with it free of consequence,
         | because they were anonymous.
         | 
         | This would be impossible to enforce, and might not be a good
         | idea because it enables other ranges of attacks: if you know
         | the identities of the maintainers of critical open source
         | projects, it's easier to put pressure on them.
        
         | damsalor wrote:
         | The attack almost worked because of too few eyes
        
         | Jonnax wrote:
         | Who designates it as critical?
         | 
         | If someone makes a library and other people start using it, are
         | they forced to reveal their identity?
         | 
         | Do the maintainers get paid?
        
         | ulrikrasmussen wrote:
         | No thanks.
         | 
         | That's not going to help, and will be fairly easy to circumvent
         | for nation state actors or similar advanced persistent threats
         | who will not have a problem adding an extra step of identity
         | theft to their attack chain, or simply use an agent who can be
         | protected if the backdoor is ever discovered.
         | 
         | On the other hand, the technical hoops required for something
         | like that will likely cause a lot of damage to the whole open
         | source community.
         | 
         | The solution here is learn from this attack and change
         | practices to make a similar one more difficult to pull off:
         | 
         | 1. Never allow files in release tar-balls which are not present
         | in the repo.
         | 
         | 2. As a consequence, all generated code should be checked in.
         | Build scripts should re-generate all derived code and fail if
         | the checked in code deviates from the generated.
         | 
         | 3. No inscrutable data should be accessible by the release
         | build process. This means that tests relying on binary data
         | should be built completely separately from the release
         | binaries.
        
           | peteradio wrote:
           | Stop trying to support such a variety of images too? Maybe?
        
           | lenerdenator wrote:
           | It's easy to steal or craft an identity. Having a person
           | adopt that identity and use it over multiple in-person
           | meetings around the world over an extended period of time is
           | not.
           | 
           | Part of the appeal of cyber operations for intelligence
           | agencies is that there's basically no tradecraft involved.
           | You park some hacker in front of a laptop within your
           | territory (which also happens to have a constitution
           | forbidding the extradition of citizens) and the hacker
           | strikes at targets through obfuscated digital vectors of
           | attack. They never go in public, they never get a photo taken
           | of them, they never get trailed by counterintelligence.
           | 
           | If you start telling people who want to be FLOSS repo
           | maintainers that they'll need to be at a few in-person
           | meetings over a span of two or three years if they want the
           | keys to the project, that hacker has a _much_ harder job,
           | because in-person social engineering is hard. It has to be
           | the same person showing up, time after time, and that person
           | has to be able to talk the language of someone intimately
           | familiar with the technology while being someone they 're
           | not.
           | 
           | It's not a cure-all but for supply chain attacks, it makes
           | the operation a lot riskier, resource-intense, and time-
           | consuming.
        
             | jpc0 wrote:
             | Many OSS contributors likely don't have "fly to distant
             | country for mandatory meeting" money.
             | 
             | You are excluding a ton of contributors based on geography
             | and income.
             | 
             | It's not common that I find this line actually decent but
             | check your privilege with this kind of comment.
             | 
             | This is really a small step away from segregation.
        
         | tgv wrote:
         | It might prevent attacks under different aliases, but a
         | determined organization will be able to create a verified
         | account, if only because nobody, certainly noy github, has the
         | will and means to verify each account themselves.
        
         | rsc wrote:
         | Two problems with this:
         | 
         | 1. Many important contributors, especially in security, prefer
         | to be pseudonymous for good reasons. Insisting on identity
         | drives them away.
         | 
         | 2. If a spy agency was behind this, as many people have
         | speculated, those can all manufacture "real" identities anyway.
         | 
         | So you'd be excluding helpful people and not excluding the
         | attackers.
        
         | akdev1l wrote:
         | If this was a state-actor (which it definitely looks like it)
         | then what validation are you going to do? They can probably
         | manufacture legitimate papers for anything.
         | 
         | Driver's license, SSN, national ID, passport, etc. If the
         | government is in on it then there's no limits.
         | 
         | The only way would be to require physical presence in a trusted
         | location. (Hopefully in a jurisdiction that doesn't belong to
         | the attacker...)
        
       | say_it_as_it_is wrote:
       | Imagine paying for a security scanning service such as Snyk and
       | finding that it never scanned source code for injection attacks.
       | How many millions of dollars went down the drain?
        
       | xurukefi wrote:
       | Since I'm a bit late to the party and feeling somewhat
       | overwhelmed by the multitude of articles floating around, I
       | wonder: Has there been any detailed analysis of the actual
       | injected object file? Thus far, I haven't come across any, which
       | strikes me as rather peculiar given that it's been a few days.
        
         | lenerdenator wrote:
         | I agree, I haven't seen anything about decompiling the object
         | file.
         | 
         | If I had a project to develop a backdoor to keep persistent
         | access to whatever machine I wanted, it would make sense that I
         | would have a plug-in executable that I would use for multiple
         | backdoors. That's just decent engineering.
        
         | tithe wrote:
         | Your best bet may be in the chat (from
         | https://www.openwall.com/lists/oss-security/2024/03/30/26 ):
         | 
         | Matrix: #xz-backdoor-reversing:nil.im
         | 
         | IRC: #xz-backdoor-reversing on irc.oftc.net
         | 
         | Discord: https://discord.gg/XqTshWbR5F
        
       | klabb3 wrote:
       | As a naive bystander, the thing that stands out most to me:
       | 
       | > Many of the files have been created by hand with a hex editor,
       | thus there is no better "source code" than the files themselves."
       | This is a fact of life for parsing libraries like liblzma. The
       | attacker looked like they were just adding a few new test files.
       | 
       | Yes, these files are scary, but I can see the reason. But at
       | least can we keep them away from the build?
       | 
       | > Usually, the configure script and its support libraries are
       | only added to the tarball distributions, not the source
       | repository. The xz distribution works this way too.
       | 
       | Obligatory auto tools wtf aside, why on earth should the tarballs
       | contain the test files at all? I mean, a malicious test could
       | infect a developer machine, but if the tars are for building
       | final artifacts _for everyone else_ , then shouldn't the policy
       | be to only include what's necessary? _Especially_ if the test
       | files are unauditable blobs.
        
         | wiml wrote:
         | Because, despite containing some amount of generated autoconf
         | code, they are still source tarballs. You want to be able to
         | run the tests after compiling the code on the destination
         | machine.
        
         | finaard wrote:
         | It's pretty common to run tests on CI after building to verify
         | your particular setup doesn't break stuff.
         | 
         | Last time we were doing that we were preferring git upstream,
         | though, and generated autocrap as needed - I never liked the
         | idea of release tarballs containing stuff not in git.
        
           | klabb3 wrote:
           | This strengthens the argument I'm making, no? You bring in
           | the source repo when doing development and debugging. In
           | either case - tarball or not - it doesn't seem that difficult
           | to nuke the test dir before building a release for
           | distribution. Again, only really necessary if you have opaque
           | blobs where fishy things can hide.
        
             | lathiat wrote:
             | The distributions often run the same tests after it's built
             | to make sure it's working correctly as built in the
             | distribution environment. This can and does find real
             | problems.
        
       | LunicLynx wrote:
       | Imagine this inside GitHub copilot, just because it has seen it
       | enough times.
        
       | PHGamer wrote:
       | if anyone has worked in any development. (closed or open) you
       | know half the time developers are lazy and just approve PRs.
       | Linus Torvalds is like the glimming exception where he will call
       | out shit all day long.
        
         | dijit wrote:
         | Second this.
         | 
         | And in the event someone is pedantic enough to actually care:
         | that person will be considered a pariah that all development
         | stifles due to.
         | 
         | Tensions with the team for nitpicking etc;
         | 
         | FD: I have a situation like this now, I am not the one being
         | picky- one of the developers I hired is. I had to move him out
         | of the team because unfortunately his nitpicky behaviour was
         | not well regarded. (he also comes from eastern europe and has a
         | very matter-of-fact way of giving feedback too which does not
         | aid things).
        
       | jcarrano wrote:
       | What's the advantage of IFUNCS, over putting a function in the
       | library that selects the implementation, either via function
       | pointers or a switch/if? In particular given that they seem to be
       | quite fragile and exploitable too.
       | 
       | I don't have much experience in low-level optimization, but would
       | a modern CPU not be able to predict the path taken by a brach
       | that tests the CPU features.
        
         | jcalvinowens wrote:
         | > either via function pointers or a switch/if?
         | 
         | > but would a modern CPU not be able to predict the path taken
         | by a brach that tests the CPU features.
         | 
         | That's true, but the CPU has finite branch predictor state, and
         | now you've wasted some of it. Indirect calls hurt too,
         | especially in you need retpolines.
         | 
         | This is a great read:
         | https://www.agner.org/optimize/microarchitecture.pdf
         | 
         | The Linux kernel has interfaces for doing the same thing, more
         | explicitly than ifunc:
         | 
         | https://docs.kernel.org/next/staging/static-keys.html
         | 
         | https://lwn.net/Articles/815908/
        
       | k3vinw wrote:
       | Ha. This backdoor belongs in the same museum as automake!
        
       ___________________________________________________________________
       (page generated 2024-04-03 23:01 UTC)