[HN Gopher] Sensenmann: Code Deletion at Scale
       ___________________________________________________________________
        
       Sensenmann: Code Deletion at Scale
        
       Author : gslin
       Score  : 127 points
       Date   : 2023-04-29 18:47 UTC (4 hours ago)
        
 (HTM) web link (testing.googleblog.com)
 (TXT) w3m dump (testing.googleblog.com)
        
       | jawns wrote:
       | The most difficult part about code deletion is practicing the
       | Chesterton's Fence principle:
       | 
       | > In the matter of reforming things, as distinct from deforming
       | them, there is one plain and simple principle; a principle which
       | will probably be called a paradox. There exists in such a case a
       | certain institution or law; let us say, for the sake of
       | simplicity, a fence or gate erected across a road. The more
       | modern type of reformer goes gaily up to it and says, "I don't
       | see the use of this; let us clear it away." To which the more
       | intelligent type of reformer will do well to answer: "If you
       | don't see the use of it, I certainly won't let you clear it away.
       | Go away and think. Then, when you can come back and tell me that
       | you do see the use of it, I may allow you to destroy it.
       | 
       | https://wiki.lesswrong.com/wiki/Chesterton%27s_Fence
       | 
       | While this tool certainly does the job of _proposing_ code
       | deletions, that 's the easier part. The harder part is knowing
       | why the code exists in the first place, which is necessary to
       | know whether it's truly a good idea to remove it. Google,
       | smartly, is leaving that part up to a human (for now).
        
         | proper_elb wrote:
         | You raise a good point, and I would answer it with agree and
         | disagree:
         | 
         | Agree: Yes, you are correct, merely observing that a code path
         | was never executed in the last 6 months is not the same as
         | understanding why the code path was created in the first place.
         | There might be the quite real possibility of an infrequent
         | event that appears just once in every two years or so (of
         | course, this should also be documented somewhere!).
         | 
         | Disagree: Pragmatically, we _have_ an answer if the code path
         | was not executed after 6 months use in production and test: We
         | know that, with a very high probability, the code path was
         | created either by mistake (human factor) or intentionally for
         | some behavior that is no longer expected from our software. To
         | continue the Fence metaphor, regarding Sensenmann: After 6
         | months, we know about the Fence that 1) it has no role to play
         | in keeping the stuff out that we want out (that was all done by
         | other fences that were had contact with an animal at least
         | once) and 2) that it _might_ have been used to keep out flying
         | elephants or whatever, but no such being was observed in the
         | last 6 months (at least the fence made no contact with it,
         | which it then should have!) and probably went away.
         | 
         | That said, having a human in the loop is probably a good idea.
        
         | anonymousiam wrote:
         | It should also be clear that this article is about "deleting"
         | code from an active project, not about "deleting" it entirely
         | from the version control system. Thus, any code "deleted"
         | through the described process could still easily be restored if
         | necessary.
        
         | breck wrote:
         | As a counter to Chesterton's Fence: sometimes the fastest way
         | to understand what something does is to remove it and see what
         | complains. You might get only 1 complainer for every 10 fences
         | you take down. Putting that one fence back up takes much longer
         | than taking it down, but the time saved from removing the other
         | 9 unnecessary ones makes it a net win. And this time you can
         | add Documentation to the rebuilt fence.
        
           | macNchz wrote:
           | Also known as the scream test in IT: unplug that old server
           | and see who screams!
        
             | shagie wrote:
             | Microsoft uses a scream test to silence its unused servers
             | - https://www.microsoft.com/insidetrack/blog/microsoft-
             | uses-a-...
        
           | IshKebab wrote:
           | A further counterpoint: if you follow the Fence proponents'
           | logic to its conclusion you can _never remove any code_ which
           | is clearly an absurd situation.
           | 
           | I think the real logical flaw is that Fencers (as I will now
           | call them) put the blame on the person who removes an
           | apparently useless fence. But they're wrong. The real blame
           | lies with the person who built the apparently useless fence
           | and didn't put a sign on it explaining why it shouldn't be
           | removed.
        
             | proper_elb wrote:
             | > A further counterpoint: if you follow the Fence
             | proponents' logic to its conclusion you can never remove
             | any code which is clearly an absurd situation.
             | 
             | No, that would only be the case if one would never
             | understand any code. Chesterton's Fence consists of two
             | parts ("understanding some code" as a precondition to
             | "removing some code"), and leaving one or the other part
             | out makes it some other thing than what Chesterton's Fence
             | means.
             | 
             | > The real blame lies with the person who built the
             | apparently useless fence and didn't put a sign on it
             | explaining why it shouldn't be removed.
             | 
             | Chesterton's Fence is not about blame, or the past in
             | general - it is about how to deal with things that are in
             | the present. (Although I agree that the original fence-
             | builder should have left a note or two!)
        
             | kube-system wrote:
             | The principle says you can't remove it _until you
             | understand why it was there_. It's more about doing due
             | diligence.
             | 
             | I follow the principle when I remove code, and it's a
             | reason why good code comments are important. "Oh yeah, this
             | was written for [x] which is no longer a thing, we can
             | remove it now"
        
             | hgsgm wrote:
             | It's not about blame, it's about making good decisions.
        
               | xboxnolifes wrote:
               | And not putting up a sign explaining why it's a necessary
               | fence is a bad decision. Avoiding removing all unlabeled
               | fences because they _might, maybe, potentially_ be
               | useful, is also likely a bad decision if taken to it 's
               | conclusion.
        
               | TeMPOraL wrote:
               | Chesterton's Fence is a reminder to actually get up, walk
               | over to the fence and skim the multiple labels and notes
               | the builders left there - because the big problem is
               | usually someone looking at a thing that came before them,
               | and _assuming_ they understand what it was for, without
               | bothering to actually check it.
        
         | einpoklum wrote:
         | The article is about the removal of _dead_ code. So, not a
         | "fence across the road" - it's a fence that was moved to the
         | side of the road, already cleared. The question is just whether
         | to dismantle the fence or keep it there just in case.
        
           | Xorlev wrote:
           | +1. And, it's in version control forever. It's not as if it
           | entirely disappears. Like one of the sibling comments
           | mentioned, I only rarely reject Sensenmann CLs.
           | 
           | That's worth explaining: it's automated code deletion, but
           | the owner of the code (a committer to that directory
           | hierarchy) must approve it, so it's rare there's ever a false
           | deletion.
        
         | opportune wrote:
         | I don't think you understand Senssenmann fully based on this
         | post. At Google basically everything in use has a Bazel-like
         | build target. This means the code base is effectively a
         | directed "forest"/tree-like data structure with recognizable
         | sources and sink. If you can trace through the tree and find
         | included-but-not-used code by analyzing build targets, you can
         | safely delete it. There are even systems (though not covering
         | everything) that sample binaries' function usage you could
         | double check against.
         | 
         | > why the code exists in the first place
         | 
         | If the code is unreachable it's at best a "possibly will be
         | used in the future" and most likely simply something that was
         | used but not deleted when it's last use was removed (or a YAGNI
         | liability).
         | 
         | If you can find a piece of code included in build targets but
         | unreachable in all of them, it's typically safe to delete. And
         | it's not done without permission generally, automation will
         | send the change to a team member to double check it's ok to
         | delete/nobody is going to start using it soon.
        
         | UncleMeat wrote:
         | "This code has been dead for six months" is a _very_ good
         | heuristic that the code is not relevant. I do occasionally
         | reject the sensenmann CLs, but only very very rarely. This isn
         | 't weird code that nobody knows why it exists but it is
         | currently doing something. This is code that cannot execute.
        
           | ninjanomnom wrote:
           | Code that only triggers from a yearly holiday, disaster
           | alerts, leap years, or the like, would have longer periods of
           | going unused and likely be very problematic if removed.
           | Unless by dead code you mean unreachable code in which case
           | it shouldn't exist in the first place and I agree should be
           | removed.
        
             | joshuamorton wrote:
             | Yes, the nice thing about blaze/bazel + sensenmann is that
             | you can very accurately say "this code was not built into a
             | binary that has run in the past 6 months".
             | 
             | Sometimes you still want it (e.g. python scripts that are
             | used every once in a while for ad-hoc things and might go
             | months between uses), but _usually_ the right thing to do
             | is productionize stuff like that slightly more (and also
             | test it semi-regularly to make sure it hasn 't broken).
        
               | CyberDildonics wrote:
               | You can probably get most of that by just looking at the
               | atime attribute on the file system.
        
               | joshuamorton wrote:
               | Nah, there's stuff that scans the entire repo regularly
               | for all kinds of interesting purposes, and of that's
               | ignoring the fact that `atime` isn't available or a
               | source of truth in piper.
               | 
               | Like conceptually I believe this could be wrong in both
               | directions, since there's heavy caching of build
               | artifacts, you can totally build a transitive dependency
               | of some file without actually reading the file (and
               | potentially do this for a relatively long period of time,
               | though I don't think that will happen in practice), and
               | stuff will regularly look through large swaths of files
               | that aren't necessarily run.
        
           | taspeotis wrote:
           | Different industries I guess. The new financial year comes
           | around every 12mo. Good luck explaining to the accountants
           | that you deleted their end-of-year reconciliation reports
           | because they didn't run them every 6mo.
        
             | jeffbee wrote:
             | Wouldn't it be mostly their fault for approving the
             | removal?
        
         | dekhn wrote:
         | Google's response to Chesterton's Fence is: "if you liked it,
         | then put a test on it".
         | 
         | I used to update the internal version of numpy for Google and
         | if people asked me to rollback after I made my update (having
         | fixed all the test failures I could detect), and they didn't
         | have a test, well, that's their problem. The one situation
         | where that rule wouldn't apply is if I somehow managed to break
         | production and we needed to do an emergency rollback.
         | 
         | I shed a tear when some of my old, unused code was autodeleted
         | at Google, but nowadays my attitude is: HEAD of your version
         | control should only contain things which are absolutely
         | necessary from a functional selection perspective.
        
           | Forge36 wrote:
           | I like that philosophy. In a similar vein: if it's important
           | why aren't we testing it.
           | 
           | How do you encourage testing?
        
       | sitkack wrote:
       | I hope everyone involved gets L+1!
        
       | [deleted]
        
       | joebiden2 wrote:
       | Sincere quesion: what is interesting or novel about this? Is it
       | just the scale or did I miss some subtle aspect?
       | 
       | This is more (or less?) the same as industry best practices, just
       | scaled up. There is a challenge in scaling up, as there is more
       | potential for someone to mess it up. But it's the same technique.
       | 
       | So what am I missing?
        
         | summerlight wrote:
         | In the perspective of software engineering economics, the scale
         | is important. Everyone knows it's good to clean up unused code
         | but they just don't care because they think it doesn't yield a
         | short term ROI for themselves. Then why don't we bring the cost
         | down and see what happens? Automation changes this equation.
        
         | codemac wrote:
         | > the same as industry best practices, just scaled up.
         | 
         | That's like saying S3 is the same as ext4, their the same, just
         | scaled up! This is a poor argument, you'll note that S3 and
         | ext4 are entirely different things, not "challenges",
         | fundamentally different implementations.
         | 
         | Google is the only company I've ever worked for that
         | automatically deleted dead code, let alone across a company of
         | 100k+ SWE.
        
           | joebiden2 wrote:
           | Fair enough and thanks for the reply. Still, for anything
           | bothering engineers more than a bit repeatedly, anyone will
           | write tools to remove the manual burden.
           | 
           | Our internal practice is to delete code if you suspect it's
           | unused, run tests, and if it doesn't affect any tests, go for
           | it. This could be automated, but it is not pressing enough,
           | so we didn't automate it yet.
           | 
           | We could though, and it may even be a good idea, but I still
           | don't get the novelty. But I appreciate your point of view.
        
             | kccqzy wrote:
             | I think the takeaway is that at Google's scale, even if you
             | think some minor problem is not pressing enough to be
             | automated, it will become pressing soon enough.
        
             | jsnell wrote:
             | Your proposed process is exactly the wrong way around.
             | You'll end up keeping dead code just because it has tests,
             | and delete code that's still used in prod just because it
             | happened to be untested.
             | 
             | This is one of the details that the blog post goes into.
             | Sounds like it's not as trivial and obvious a problem as
             | you think it is, and you would have benefited from just not
             | dismissing the post because of that.
        
             | brunooliv wrote:
             | Not to criticize your POV and argument directly, but, in
             | the end, a lot of things, especially like these, are always
             | easily subjected to the "we could do it, but just didn't
             | bother to yet" kind of argument, and, when it comes down to
             | the real work, things are much harder than they
             | superficially appear to be. So yeah this isn't new...
             | But... You know eheh
        
               | joebiden2 wrote:
               | Well, I'd politely agree to disagree. Google scale is
               | defined by novel, radical approaches, like for example
               | inventing map/reduce, writing papers on LLMs others then
               | implement successfully, or creating something like
               | Kubernetes.
               | 
               | The specific topic here is not one of those google
               | problems to me, as I can compare it to other problems we
               | already solved. But yes, we could miss that critical
               | point where a totally different problem domain emerges
               | just from one order of magnitude more, so fair game.
        
       | btilly wrote:
       | I liked most of the blog, but it bothers me to see stuff like,
       | "... just as with the introduction of unit testing 20 years
       | ago...".
       | 
       | No, unit testing was NOT introduced 20 years ago. As an example,
       | Perl 1 was released about 35 years ago with a unit test suite
       | that got run on every install. Every version of Perl has done so
       | since, and since CPAN came along, most Perl modules have followed
       | suit. This was the secret sauce behind Perl's reputation for
       | being so portable.
       | 
       | Nor was Perl a pioneer. In fact unit testing was used in the
       | 1960s on the Apollo program, and was even called unit testing. I
       | believe that the concept can be dated back to a 1950s textbook
       | but I can't find the reference.
       | 
       | So unit testing is over 60 years old.
        
         | allanrbo wrote:
         | Maybe they just mean that Google started investing seriously in
         | unit testing 20 years ago?
        
         | UncleMeat wrote:
         | This refers to Google, I'm pretty sure. Early Google didn't
         | believe in unit testing and it took a few particularly stubborn
         | engineers to demonstrate the value of the practice and convert
         | the culture to promoting unit testing.
        
       | charcircuit wrote:
       | Does this only get rid of unused binaries? Or is the system smart
       | enough to use the profiling infrastructure to identify dead code
       | in general?
        
         | kccqzy wrote:
         | Profiling is inherently probabilistic and I don't think it
         | should be used. Anything that inspects the runtime (dynamic)
         | behavior of code isn't good enough for a code deletion tool.
         | Only static analysis will do.
        
         | Jolter wrote:
         | Just read the article. It's not that long.
        
           | charcircuit wrote:
           | I was hoping to spark a conversation about this approach as
           | no such thing was mentioned in the article even though it
           | should be possible to do.
           | 
           | The whole point of these comment sections is to have a
           | discussion. If the point of this site was just to read
           | articles there wouldn't be a comment section.
        
       | croes wrote:
       | FYI: Sensenmann is the german word for Grim Reaper
        
         | Zetobal wrote:
         | I still don't get the tech industries fascination with random
         | german words at least here it's sort of fitting.
        
           | aardvarkr wrote:
           | Citation needed ^ Maybe there are just some great German devs
           | and you're using a lot of their software?
        
             | mflendrich wrote:
             | Google's Zurich office has had a tradition of creating
             | codenames in German (regardless of the backgrounds of any
             | engineers involved).
             | 
             | Source: I worked on Sensenmann.
        
           | meibo wrote:
           | Looks like this was made by a team in Zurich, which is mostly
           | German, so I imagine it came to them fairly naturally, and
           | who doesn't want to pick cool names for hackathon projects.
        
             | evmar wrote:
             | It was kind of an internal joke at Google for German-
             | speaking teams to make German-named projects. (It's maybe
             | only a joke that makes sense to the infamous German sense
             | of humor.)
        
       | mkoubaa wrote:
       | Sounds like a useful system from which almost nothing is usable
       | outside of Google.
        
         | vamega wrote:
         | Sure; but working at another very large company, I can say I
         | wish we had this.
         | 
         | Old unused code is a huge problem for us. The coordination
         | costs of trying to update company wide problems are made much
         | more severe by old code.
         | 
         | I wish we had something like this. We're large enough we'd need
         | our own system anyway. We don't have a monorepo, and we don't
         | use tools so many others do.
        
       | DamonHD wrote:
       | I worry about archival and enough history for diagnosing long-
       | standing subtle issues that take a long time to surface as bugs.
       | This is not theoretical: apparently a TeX bug picked up after
       | many years had been there from the start.
        
         | kragen wrote:
         | i think piper saves the full history of the whole monorepo; if
         | that's correct it's not 'deletion' in that sense
        
           | gravypod wrote:
           | (opinions are my own)
           | 
           | > Its goal is simple (at least, in principle): automatically
           | identify dead code, and send code review requests
           | ('changelists') to delete it.
           | 
           | It sends CLs (pull requests) and shows up as a commit. You
           | get a chance to approve or deny the deletion
        
             | kragen wrote:
             | yeah but even if you approve it the code is still there in
             | the code history, right?
        
               | bradfitz wrote:
               | Yes.
        
               | DamonHD wrote:
               | Good.
               | 
               | But relatively hard to find and work with.
        
               | dekhn wrote:
               | I used to do archeaology on Google's monorepo (IE,
               | looking far into the past of mapreduce, search engine,
               | ads, and other products) and it wasn't really that hard.
               | Heck, there was even a sythetic filesystem where you
               | could just cd to a historical commit # and see a view of
               | the repo at that timepoint (google's version control is
               | based on an always-increasing globally shared commit
               | numbers).
        
               | kevinoconnor7 wrote:
               | Not really. If you're in the very rare situation where
               | you need to diagnose a bug in long-since dead code, you
               | can just view repository synced to where the version was
               | cut.
        
               | pradn wrote:
               | No you just go to the folder and select "show deleted".
        
               | dmoy wrote:
               | Don't even need to do that. Codesearch with a `from:0`
               | qualifier does full regex search on the history iirc
        
               | tonfa wrote:
               | Still fairly easy to find IMO. You can search deleted
               | code easily, and blame layers allows finding how code
               | evolved fairly quickly.
        
               | kragen wrote:
               | thank you
        
       | jmyeet wrote:
       | The most important part of this is that the build units are
       | hermetic and all dependencies are explicit. This is why you need
       | to use something like Bazel/Blaze vs older build systems like
       | make where identifying what's used, particularly when you get
       | into meta-rules, becomes all but impossible.
       | 
       | As the article points out, you also have to look at what's
       | actually run. This is the real advantage of Google
       | infrastructure: the vertical integration so if a binary is run on
       | Borg, or even on the command line, that can be tracked.
        
       | lifeisstillgood wrote:
       | There is a meta-meta situation surrounding really good software
       | management.
       | 
       | You can knock up some code that say solves a specific business
       | problem right now. (meta:0)
       | 
       | But you need an environment that can take a new piece of code and
       | deploy it and test it (meta:1)
       | 
       | how is that code running - this is shadingnfrom production
       | monitoringninto QA and performance (meta:2)
       | 
       | Compare all the running code and its performance against the
       | benefits of replacing code or going back to level 0 and just
       | fixing a business problem (meta:3)
       | 
       | Then this death eater - meta 4 I think.
       | 
       | And to me this is why comments like "software needs to solve
       | business problems" is naive - once you start using software you
       | need more software to manage the software - it's going to grow
       | till it consumes the business.
        
       | falcor84 wrote:
       | >For example, if an engineer is unsure how to use a library, they
       | can find examples just by searching
       | 
       | Isn't that the case with all libraries? How does the monorepo
       | help here?
        
         | er4hn wrote:
         | Discoverability. It's a lot easier to search one repo then it
         | is to search a set of repos. For the latter you need to have
         | all the repos listed somewhere, and have them be accessible.
        
           | speedgoose wrote:
           | SourceGraph is great to search in many repos.
        
           | knutzui wrote:
           | It's just as easy to index multiple repos as it is to index
           | one, which means that the same goes for searching. Why would
           | it be any different for one as opposed to many?
        
             | codetrotter wrote:
             | We use GitLab in the company I work for. There may be
             | repositories created by others in the company, that depend
             | on repos I work on, where I don't have access to said other
             | repos. So to me these are invisible. If everything was in
             | one monorepo, it'd all be visible to me easily.
        
               | Kwpolska wrote:
               | That's more of a culture thing. Your company chose to
               | enforce more granular permissions. Perhaps there are good
               | reasons behind it, e.g. code for different clients being
               | under different contracts and NDAs.
        
         | speedgoose wrote:
         | I'm not sure. I also thought that these big repos have to use
         | sparse checkouts to not use too much space on the developers
         | machines. So you would have to use an external code search
         | index anyway.
        
         | kpw94 wrote:
         | "searching" how a method is used is as simple as clicking on
         | the symbol.
         | 
         | Think Visual Studio "find all references", but working around
         | the entire company's codebase, not just your current project.
        
       | oneplane wrote:
       | On a non-google level just being aware of code sitting around
       | costing resources is pretty important. Often, tests and
       | maintenance are just ignored or not calculated in as cost (be it
       | time, money, effort or otherwise). It is almost in the same realm
       | as "I don't know why it works" which is as dangerous as "I don't
       | know why it doesn't work".
        
       ___________________________________________________________________
       (page generated 2023-04-29 23:00 UTC)