hngopher.com

       [HN Gopher] Reformatting 100k Files at Google in 2011
       ___________________________________________________________________
        
       Reformatting 100k Files at Google in 2011
        
       Author : laurentlb
       Score  : 136 points
       Date   : 2024-06-15 22:46 UTC (1 days ago)
        
 (HTM) web link (laurent.le-brun.eu)
 (TXT) w3m dump (laurent.le-brun.eu)
        
       | wonger_ wrote:
       | Autoformatting is so nice. Crazy to think that formatters only
       | became popular after `gofmt`.
       | 
       | I also found this related quote from Russ Cox intriguing: "Most
       | people think that we format Go code with gofmt to make code look
       | nicer or to end debates among team members about program layout.
       | But the most important reason for gofmt is that if an algorithm
       | defines how Go source code is formatted, then programs, like
       | goimports or gorename or go fix, can edit the source code more
       | easily, without introducing spurious formatting changes when
       | writing the code back. This helps you maintain code over time."
        
         | DangitBobby wrote:
         | That doesn't really seem like a good reason. They remove all
         | formatting control with an automated tool so it doesn't matter
         | when (if, big if) an automated tool later rewrites some of your
         | source code formatted in a way you don't control?
        
           | Nzen wrote:
           | Mu. Russ Cox quote indicates that _the maintainers_ of
           | goimports and gorename do not need to handle formatting at
           | all (and hence, benefit). The go team put that wholly in the
           | gofmt team 's court. Otherwise, the go team would likely have
           | to handle tickets about how the formatting of goimports and
           | gorename differ over time for edge cases (if only by paying
           | the social capital of ignoring the tickets).
        
             | DangitBobby wrote:
             | That would fall under what I already assumed to be true,
             | and also under an umbrella they specifically said it wasnt
             | about
             | 
             | > end debates among team members about program layout
             | 
             | (Except in this case of course it's not "team members")
             | 
             | I'm definitely not getting this interpretation from the
             | quote.
        
           | striking wrote:
           | It's more common than you think! Tools like
           | https://github.com/facebook/jscodeshift, https://ts-
           | morph.com/, and https://ast-grep.github.io/ are the ones I'm
           | more familiar with (since my day-to-day is TypeScript).
           | 
           | When I led the conversion of a decently-sized codebase from
           | Flow-typed JS to TypeScript, I ensured that a code formatting
           | tool that we were already using on pre-commit and CI called
           | `prettier` was executed after each step. We took a git
           | snapshot of each step of our automated conversion pipeline,
           | and the diff was much clearer with `prettier` in place at
           | each of those steps.
           | 
           | We've since used codemods frequently to make huge changes to
           | the codebase in an automatic, reproducible, iterable way.
           | They're very comfortable, very fun, and (thanks to the use of
           | a formatter on all code) rarely produce incomprehensible
           | diffs.
        
         | delichon wrote:
         | It reduces cognitive load on integrations, both software and
         | wetware. So many brain hertz are wasted on internally
         | bikeshedding format details. Autoformat fixes a hole where the
         | rain gets in and stops my mind from coding.
        
         | autarch wrote:
         | The Perl world had perltidy (first release in 2002) many years
         | before Go was even a thing. It's funny that Perl, a language
         | notorious for it's "There's More Than One Way To Do It"
         | (TMTOWTDI) philosophy had a tidier so early. Of course,
         | perltidy is _ridiculously_ configurable.
         | 
         | One thing I really love about gofmt is that it has no
         | configuration at all. I think that was a major "innovation" and
         | I'd love to see more languages adopt this approach.
        
         | afavour wrote:
         | I've wondered before whether the world would be well served by
         | a programming language (or source control system, I suppose)
         | that just stores ASTs in files rather than text code. When
         | users open the file the editor formats to whatever their
         | personal preference is, then saves edits back to the AST.
         | 
         | It really is dumb to be arguing over tabs vs spaces, after all.
        
           | rsc wrote:
           | At that point you could even use different languages. Maybe
           | you like programs that look like Lisp and I don't. There was
           | a project at Microsoft Research in the late 1990s/early 2000s
           | that did exactly this - storing ASTs in source control
           | instead of code - but the name escapes me at the moment.
        
           | turtleyacht wrote:
           | One (naive) approach I keep thinking on is using ed(1) to
           | write _transforms_ of code, so chunks of Java are later sewn
           | together to create the app.
           | 
           | This always sounds more difficult on paper than just
           | wrestling dependencies till dawn, upgrading from JDK 11 to
           | JDK 17, for example. So I usually give up the mental exercise
           | there.
           | 
           | Plus, following a file of transforms is mind-bending: someone
           | may follow a method definition with a pattern seek, and then
           | start appending some more code. Context is lost. It would be
           | literate programming only with enough empathy for comments.
           | 
           | Which is all to say, would it be easier to move between
           | Spring versions if the app's commit history were a series of
           | transforms instead of changes to static files?
           | 
           | Suppose a commit establishes a framework version, and then
           | follow a bunch of commits for domain objects, a skeleton
           | controller, and so on. If we could play those decisions
           | forward, but edit the _transform_ instead of the source,
           | would it be easier to dissect which next dependency to
           | manage?
           | 
           | This loops back to ASTs: we would still edit and change
           | files, but the history would be ed(1) macros (or something
           | better, like ASTs). Somehow, it feels like there could be
           | reconciliation between source control and "manipulating a
           | timeline of changes."
           | 
           | Git may already have this, or a simple while loop with some
           | decisions about how far to play the changes, like editing a
           | cassette tape. A list of patches to apply, with pre- and
           | post- hooks for rules scripts.
        
           | striking wrote:
           | A sufficiently-motivated engineer could do this today by
           | setting a different formatter in their editor than in their
           | pre-commit hook. Then they need only activate the formatter
           | in their editor as they edit, and ensure pre-commit hooks run
           | as they commit.
           | 
           | This saves you the trouble of authoring a separate
           | programming language, or finding a way to preserve all of the
           | niceties of the original syntax and formatting that wouldn't
           | directly translate to an AST (like how many newlines are
           | after a particular stanza or function).
           | 
           | Case in point: Recast (https://github.com/benjamn/recast) is
           | of particular interest with regards to JS/TS in this vein,
           | because it does preserve a lot of the spirit of the source in
           | its conception of the AST. But also last time I used it
           | (couple years ago now) it would explode on any code with an
           | emoji in it. It's genuinely not an easy problem.
        
           | shawn_w wrote:
           | I think that's what InterLisp did.
        
           | eru wrote:
           | > It really is dumb to be arguing over tabs vs spaces, after
           | all.
           | 
           | In an in-house dialect of Haskell I used to work with, we
           | solved this problem by just making tabs a syntax error. Never
           | had any problems.
           | 
           | (I think tabs might be have been allowed inside strings.)
        
             | kccqzy wrote:
             | So I'm guessing you mean Mu, the dialect by Standard
             | Chartered.
        
               | eru wrote:
               | Yes, indeed. In it's circa 2014 incarnation, when I last
               | worked there.
               | 
               | (I don't really know what happened to it since.)
        
             | afavour wrote:
             | Arguably it would be a problem for a coworker that wanted
             | to use tabs.
             | 
             | "We'll just force everyone to do things one way" kind of
             | ignores the point I was making. It shouldn't be necessary
             | for you to _care_ how anyone else formats their code, same
             | as you don't care what font their code displays in or what
             | text editor they use. It feels like a vestigial aspect of
             | programming that we have to concern ourselves with it in
             | 2024.
        
           | python_pele wrote:
           | Unison has something similar to what you're talking about:
           | https://www.unison-lang.org/
        
           | coolcoder613 wrote:
           | Many BASIC dialects did this.
        
           | tengbretson wrote:
           | Does that mean that a file with a syntax error is not able to
           | be saved?
        
             | SR2Z wrote:
             | Well, not as an AST at least. Presumably it would still be
             | ok as text.
        
             | __s wrote:
             | language servers have pushed things to inspecting source
             | code while programmer has partially written code, so having
             | a kind of (invalid-span content="garbage") node in AST
             | helps
        
               | duped wrote:
               | Language servers usually are designed around
               | full/concrete syntax trees instead of ASTs for exactly
               | this reason. Adding error nodes to the AST is a hack that
               | hurts more than helps.
               | 
               | More technically, language servers usually have a CST
               | that they use to build the AST incrementally, and the AST
               | contains references back to the CST that generated it.
               | This is what allows you to handle incremental text edits
               | and compile small deltas to the AST instead of the
               | typical batch compiler design that attempts to parse
               | everything all at once.
        
               | jenadine wrote:
               | What's wrong with error node in the ast?
               | 
               | I've seen language server that completely ignores the
               | parts with error, and I much prefer error nodes because
               | then I still know there is something and these error node
               | can still have children
        
           | duped wrote:
           | One very nice property of an AST is that it's an IR that is
           | allowed be instable even if the syntax it represents is
           | stabilized. If the AST becomes the source of truth on the
           | file system you lose that property.
           | 
           | On top of that, now you need to write a parser and compiler
           | for your AST file. It's probably very simple and does
           | rudimentary validation, but that defeats the point of the AST
           | - it's a valid, canonical representation of a program by
           | construction.
           | 
           | All in all, it seems like a good idea, and people have done
           | it. But there's also good reasons to be apprehensive.
           | 
           | And at the end of the day, you need to ingest text as input,
           | and you need to do it as fast as possible. There's not a ton
           | of benefit to keeping an AST around on disk and in sync with
           | the text that generated it when you are already able to
           | compute it faster than you can read and deserialize it.
        
         | omoikane wrote:
         | Autoformatters got popular was because a lot people don't care
         | about formatting, and those who do care can't win against the
         | _auto_ part of autoformatters.
         | 
         | It works for go because gofmt was there from the start, so even
         | if you are returning a multi-dimension array and elements come
         | out unaligned, that's just accepted as how it is and nobody
         | cares. For other languages, people will have to either accept
         | "not caring" as becoming the norm, or actively fight the
         | autoformatter from steamrolling over their code.
         | 
         | For people who would give more thought to how their code would
         | be read, autoformatters were often more frustrating than
         | "nice".
        
         | rowanG077 wrote:
         | I never thought it only got popular with go. I remember using
         | formatters before go. What go did do is that it shows how good
         | it is if it is ubiquitous.
        
           | plorkyeran wrote:
           | I think _mandatory_ code formatting was popularized by Go?
           | Plenty of code formatting tools had been around for quite a
           | while by the time Go came out, but they were the sort of
           | thing you ran on a codebase once to clean up a mess, not
           | something you checked on each commit. A common sentiment was
           | that it was a good idea but was only viable because Go had it
           | from the beginning, which fortunately turned out to be wrong.
        
         | fweimer wrote:
         | Java IDEs have had optional reformat on safe for a long time.
         | 
         | https://help.eclipse.org/latest/index.jsp?topic=%2Forg.eclip...
        
           | jakjak123 wrote:
           | They have, but the different IDEs do not format equivalent
           | and are way too easy to reconfigure to personal preference.
        
       | vitiral wrote:
       | I'm going to take a contrarian view here. Code formatting is
       | amazing in a corporate environment where nobody truly cares about
       | their code -- it's just a means to get a paycheck. It's also
       | great for beginners to a language who are still trying to get a
       | handle of the syntax.
       | 
       | But where you are nearly the sole owner of a small library and
       | you are crafting that library to be beautiful and
       | understandable... there is something pleasurable about
       | structuring concepts so you have each on a single line, or
       | creating similar functions so the concepts are structured by
       | column.
       | 
       | I know not everyone will hold this view and that is fine, but
       | when you are writing your own hobby library in your favorite
       | language for your own purposes I recommend you try it out.
        
         | JZL003 wrote:
         | I do more research, one-off ,code and being able to align
         | equals signs or successive lines which do similar things both
         | makes me happy and makes it easier to skim. I think the Linux
         | kernel does it too. Emacs has a way of aligning by common
         | separators (=/, as well as quick regex)
         | 
         | I guess I've never gotten into an actual serious debate with
         | someone over formatting so I don't know what I'm avoiding, but
         | sometimes auto formatted code makes it harder to skim
        
         | lokar wrote:
         | Everywhere I've worked, including Google, many (probably most)
         | people care a lot about the code and the format. Often too
         | much. That was the point rsc was making.
        
         | neilv wrote:
         | Also, sometimes the formatting simply makes reading something
         | messy merely tractable, not aesthetically pleasing.
         | 
         | Once, a product launch depended on me urgently kludging a
         | device driver in Python (long story). And this involved a large
         | hand-maintained mapping table. I wrote it quickly but
         | carefully, and found some formatting that made the table
         | readable enough, without implementing a minilanguage in Python.
         | 
         | But the Black formatter had been rigged to run automatically on
         | commit, so... poof! :)
        
         | markrages wrote:
         | Yes! Code is for people to read, not computers. Beautiful code
         | uses the full expressiveness that the language allows.
        
         | striking wrote:
         | I think you're doing yourself a disservice by having to play
         | with the formatting itself instead of being able to express
         | what you mean with the tools the language already provides you.
         | 
         | As soon as anyone else looks at your code or wants to
         | participate in the development process, what is meaningful to
         | you about the arrangement simply won't translate into their
         | view; what is the use of a programming language if not to use
         | its existing definitions and concepts to communicate the
         | concepts of programming?
         | 
         | You can use fancy syntax tricks, but domain-specific languages
         | are a much better way to handle the same problem. You can
         | express things with them that other humans can understand while
         | still retaining access to your existing formatting tools.
        
           | vitiral wrote:
           | I personally dislike ad-hoc DSLs that aren't composed of the
           | language's existing syntax.
           | 
           | Considering I get pleasure from formatting the code
           | beautifully I'm not sure how it's even possible to be doing
           | _myself_ a disservice. It's like saying I'm doing a
           | disservice learning guitar instead of listening to Coldplay
           | on my speakers.
           | 
           | As far as others are concerned... I agree somewhat, but I
           | don't think it's by any means proven. I think we are always
           | better off then cobbled together code where formatting hasn't
           | been thought about. By well-designed hand-crafted formatting
           | can be expressive in it's own way IMO
        
         | rowanG077 wrote:
         | I mean you can do whatever you want in a hobby project you work
         | on solo.
         | 
         | I do want to say that I have the opposite view. People who use
         | formatters want their code to be consistent and go the extra
         | mile it ensure it does. It's like manual testing vs automated
         | testing to me. Sure with manual testing you can test many more
         | corner cases as they come up as an intelligent person is in the
         | loop. But there will be mistakes made, tests forgotten etc.
         | Just like there will always be inconsistencies when you
         | manually format the code.
        
           | vitiral wrote:
           | Im not sure how it's even remotely related to testing. I do
           | agree that it depends on the person, just like every medium
           | depends on both the artist creating the work and the patrons
           | viewing it.
           | 
           | Many people who use formatters (myself included) just want
           | consistent code and don't want to bicker with others about
           | it. When it is soley owned by me and I'm doing it for fun,
           | these reasons fall away for me.
        
             | rowanG077 wrote:
             | It meant to illustrate the same human weakness in
             | consistency and repeatability disadvantages both manual
             | testing and manual formatting. In essence manually
             | formatting is a baker making every bread by hand. Some
             | bread will be better then others. In contrast to an
             | automated factory turning out the same quality bread every
             | day.
        
               | vitiral wrote:
               | Ah, makes sense. I think there are use-cases for both,
               | even personal hobby projects. There's some languages with
               | so much damn syntax that I don't think I could even begin
               | to write code without a formatter (I'm looking at you
               | rust and Java) and others where it stays out of your way
               | (python, Lua, etc)
        
         | erik_seaberg wrote:
         | This. Automatically doing it well (clarifying the intent of the
         | code) is an AI-complete problem. That's not an excuse for
         | automatically doing it poorly and forcing everyone else to live
         | with it.
        
         | paulddraper wrote:
         | I don't find the mental burden of formatting code to have a
         | positive ROI.
         | 
         | I want to write syntacially-valid code, without worrying about
         | the visual presentation of it. (I want a good presentation, but
         | I don't want to put forth the effort to create it.)
        
       | flymasterv wrote:
       | I don't understand why they had to format 100k files. You enforce
       | the format with a presubmit and let the code get formatted in the
       | next change.
       | 
       | I have long felt that Google's strength has always been making a
       | bad architectural choice and then executing on it flawlessly. So
       | many systems are designed in ways that require incredible
       | technical execution to make them workable, and they do it.
        
         | ChrisAntaki wrote:
         | Ex: Angular
        
         | valicord wrote:
         | Because then every commit from now on has a ton of formatting
         | changes that make it harder to see what was actually changed.
        
           | flymasterv wrote:
           | And this is a flaw of Perforce: in a Git/Mercurial system,
           | the presubmit can stack the changes into two commits. In P4,
           | one CL has to contain both changes.
           | 
           | And Google uses P4(ish) because monorepo, so they build
           | further abstractions over P4 to enable git and hg in user
           | space which erases most of the potential benefits of either
           | which is also all really, really good software, but it's all
           | effort necessitated by monorepo. CitC is a work of art, but
           | it is also something necessitated by a stack of other choices
           | that forced their hand into inventing something miraculous to
           | keep hacking around a previous limitation that nobody else
           | has.
        
             | valicord wrote:
             | >the presubmit can stack the changes into two commits
             | 
             | That seems like way more complexity than just doing it once
             | and for all. Now the commit log is littered by a bunch of
             | automatic commits that format one file at a time.
        
               | lesuorac wrote:
               | The commit history for untouched files is mostly just
               | cleanup CLs for reformatting or changing an import for a
               | decade+. A lot of the history is rather useless for
               | finding a bug but at least its generally well tagged with
               | 'CLEANUP=TRUE' so you know to ignore them.
        
             | lesuorac wrote:
             | You can effectively submit two CLs at the same time where
             | the first CL is just a formatting change and the second has
             | just your changes. Although you would need approval for
             | both CLs which really is no different than if you used
             | Git/Mercurial.
        
             | fragmede wrote:
             | git5's been depreciated. fig/hg is well supported though.
        
         | rsc wrote:
         | If you do it that way, then what should be a 1-line BUILD file
         | change turns into something that changes every line. It
         | distracts from the actual purpose of the future change. Many
         | directories aren't touched for long periods of time. A few
         | months from now someone tries to make a 1-line change and is
         | unpleasantly surprised they have to deal with tons of seemingly
         | spurious formatting changes. Not good.
         | 
         | Putting the time of submitting the changes on a small team
         | (mostly me, with approvals from Rob and help from Laurent) was
         | absolutely the right tradeoff. It avoided the "unfunded
         | mandate" and tech debt of making everyone else deal with it.
         | 
         | Update: I found the FAQ we wrote back then. It was very short.
         | These were the last two questions:
         | 
         | Q: Who will update all the existing BUILD files?
         | 
         | A: We will. There are nearly 200,000 of them, and we'll take
         | care of that. We're sending CLs out now. If you want to do it
         | yourself, that's fine: see go/buildifiernow for a tool that can
         | help.
         | 
         | Q: You're creating a lot more work for me.
         | 
         | A: We are creating significant amounts of work for ourselves,
         | including reformatting all 193,000 BUILD files in google3. For
         | the rest of the engineers in the company, we intend to make the
         | transition as smooth as possible, with integration in Eclipse,
         | Emacs, and Vim, as well as tools like Rosie and GenJsDeps. It
         | is an explicit goal not to create significant work for other
         | engineers. If, as we roll this out, you find that we've created
         | noticeable work in your workflow, please let us know so that we
         | can address that.
        
           | diegocg wrote:
           | Make the one line change be a commit, then the reformatting
           | be another one, review only the first one. It shouldn't be a
           | problem with a proper review system.
        
             | dilyevsky wrote:
             | Googles source control system at the time (Perforce) didn't
             | allow for this at least easily. Not sure about now
        
             | tantalor wrote:
             | All changes need to be reviewed. That's the point of code
             | reviews.
             | 
             | Your suggestion would allow people to bypass the code
             | review by just saying "oh it's just cleanup don't worry".
        
             | laurentlb wrote:
             | If every engineer needs to make two commits when they
             | change the build file, that's a higher cost compared to
             | having people dedicated to the migration.
        
         | lopkeny12ko wrote:
         | Did you even read the article? This is literally what it said.
         | 
         | > Rob and Russ shared their ambitious plan: to reformat every
         | single Bazel BUILD file in Google's codebase and enforce this
         | formatting with a presubmit script.
        
           | refulgentis wrote:
           | It's saturday night, in spring, it's really beautiful out.
           | Another year alive. It gives me joy and reminds me to ease
           | off a bit.
           | 
           | In unrelated news, OP was suggesting _no_ 100K file CL, and a
           | _presubmit_. They were _not_ disputing what the article said.
           | They were suggesting sharding out the initial formatting
           | change to 100K individual CLs.
        
         | y2mango wrote:
         | Foremost: formatting these BUILD files was the correct decision
         | as rsc already explained.
         | 
         | Then, there are also other formatters that support "incremental
         | formatting", meaning it only formats lines that are changed in
         | your commit.
         | 
         | Disclaimer: I authored https://github.com/google/pyink and
         | replaced Google Python's YAPF formatter with this Black fork
         | and also implemented the "incremental formatting" feature in
         | Pyink and upstreamed to Black.
         | 
         | When we were rolling out the formatter change, we chose to NOT
         | format the Python files mainly because 1) not all teams at
         | Google enforce Python formatting at presubmit time; 2) the
         | formatter supports "incremental formatting" to minimize the
         | diffs introduced by the formatter.
         | 
         | There are of course less ideal cases where even incremental
         | formatting has to touch not-changed-lines, such as a large
         | Python dictionary/list/set literal that spans across dozens or
         | even hundreds of lines. It's a tradeoff in the end.
        
       | rsc wrote:
       | My notes say it was 193k at the start. The final dashboard when
       | we stopped said "216,626 / 216,890 = 99.8%; 264 to go".
       | 
       | The other correction I would make is that this post does not
       | mention Nilton Volpato, who had written an earlier Buildifier and
       | graciously accepted replacing his implementation with a new one
       | and then taking over ownership for that new implementation as
       | well. (Eventually ownership moved to Laurent's team.)
       | 
       | It looks like it was just under 2,000 commits. We did pretty
       | extensive testing, by having Blaze load a BUILD file and its
       | transitive closure and then dump that parsed form back out to a
       | binary format. Any automated commit had to preserve that parsed-
       | and-dumped binary format bit for bit. The slowest part of the
       | testing was waiting for Blaze to do all the loads.
       | 
       | Every day I would prepare and test as many files as I could,
       | break them into CLs (think PRs), mail Rob a shell script he could
       | run to approve them all, and go to bed. Then I'd get up early in
       | the morning (5am ET) to submit the changes, because there were
       | various cached indexes that got updated when BUILD files got
       | submitted, and it seemed better to send them when not many people
       | would be working.
       | 
       | That scheme worked until a system did fall over and someone got
       | paged, and then after that I agreed to only submit the large
       | changes during business hours. :-)
        
         | laurentlb wrote:
         | Thanks for the precisions! I've added some updates at the
         | bottom of the post.
        
         | haburka wrote:
         | You didn't have Rosie to automatically split up your changes
         | and send them out yet?? That must have been rough. LSCs are way
         | easier now
        
           | ammar2 wrote:
           | Yeah I'm surprised by that as well. As far as I remember,
           | Rosie started out in 2010 and people were using in 2012.
           | Maybe the clustering/splitting didn't support this use-case
           | or it wasn't well-known enough?
        
       | troad wrote:
       | The use of light grey text on a darker grey background strains my
       | eyes and makes this unnecessarily unpleasant to read. I'd
       | respectfully suggest increasing the contrast dramatically.
       | 
       | I keep a quick little scriptlet in my bookmark bar for cases like
       | this:                 javascript:(function(){
       | $('head').append('<style>*{color:#101010 !important;
       | background:#f0f0f0 !important;}</style>'); }());
       | 
       | (A ten second hack job; suggested improvements from front-end
       | friends are welcome.)
        
         | laurentlb wrote:
         | Thanks, I've increased the contrast (not as much as your
         | suggestion, but I hope it's better).
        
       | bsmith0 wrote:
       | I'd be curious what role "global approvers" at Google/Google's
       | scale typically have/how many are there/what's the process?
        
         | dmazzoni wrote:
         | I was at Google until a few years ago.
         | 
         | The purpose of global approvers was exactly things like this.
         | If you want to do a mechanical change to an insanely huge
         | number of files, they can potentially approve it.
         | 
         | In my experience, global approvers were used extremely rarely,
         | only in cases like this where the transformation was purely
         | mechanical and it was possible to verify that there were no
         | logic changes.
         | 
         | Most of the time rather than global approvers, you were
         | encouraged to use a system that would automatically split your
         | change into a bunch of smaller CLs (PRs), automatically send
         | those to owners of each module, then automatically merge the
         | changes if approved. It would even nag owners daily to please
         | review. If you had trouble getting approval for some files you
         | could escalate to owners of a parent directory, but it'd rarely
         | be necessary to go all the way up to global approvers.
         | 
         | Basically if there was even the slightest chance that your
         | change could break something, it's always safer to ask
         | individual code owners to approve the change.
        
           | tylerhou wrote:
           | Even if you get global approval it is still good to split CLs
           | to avoid e.g. merge conflicts.
        
         | y2mango wrote:
         | I was one of the global approvers (and also on the Python team
         | until my role at Google was eliminated recently). I no longer
         | have access to the stats, but from my memory, there are
         | currently 50+ global approvers depending on how many are still
         | considered active.
         | 
         | Typically, code authors would create a proposal by filling out
         | a doc template. It's usually light weight and also accompanied
         | with examples or full set of the pending code changes. Then 1-3
         | of us will review and LGTM the proposal. As part of the review,
         | we also determine whether the changes should be sent to local
         | code owners, or "globally approved" by one of us. The default
         | option is to use "global approval", unless the changes need
         | local code owner's knowledge during the code review. Said in
         | another way, when sent to local code owners, their role is not
         | gate keeping the changes, but to provide necessary local
         | knowledge where we as global approvers don't have.
         | 
         | Refactoring changes, such as formatting or API migrations,
         | shouldn't bother local code owners because 1) it would just be
         | a waste of their time to review and approve; 2) in practice, we
         | find a central code reviewer for the same large set of code
         | changes is more likely to catch bugs (with review automation
         | tooling) than local reviewers.
         | 
         | We consider ourselves as facilitators rather than approvers or
         | gatekeepers of the code changes. Our goal is to make these
         | changes done more efficiently and save engineering time when
         | possible.
         | 
         | If you like stats: over the past 5 years, I have reviewed ~300
         | such proposals and ~40K changelists (equivalent to PRs). One
         | changelist/PR typically contains 10s to 100s of files depending
         | on the nature of the change. When I was most active, I was
         | about ~5th-ish when ranking the number of changes we were
         | approving. There are many global approvers who have approved
         | more than 100K changelists, which is a milestone we celebrate
         | with a cake. Too bad I didn't have the chance to have my cake.
        
           | mudkipdev wrote:
           | Just curious, what kind of global changes do you usually
           | make? And what is the process of becoming an approver?
        
       | yegle wrote:
       | There are 3 tools that that makes maintaining BUILD files
       | enjoyable: buildifier, buildozer and build_cleaner (internal only
       | unfortunately).
        
         | lopkeny12ko wrote:
         | What is build_cleaner?
        
           | randomifcpfan wrote:
           | A tool for updating bazel build target dependencies. It
           | inspects build files and source code, then adds/removes
           | dependencies from build targets as needed. It requires using
           | global include paths in C/C++ sources. It is not perfect, but
           | it is pretty nice!
        
             | rsc wrote:
             | If you're using Go with Bazel, gazelle is available outside
             | Google: https://github.com/bazelbuild/bazel-gazelle
             | 
             | Enabling tools like these was exactly the point of the
             | enforced formatting. It worked extremely well.
        
       | rapfaria wrote:
       | After reformatting, did git blame always pointed to those commits
       | and the most recent author, or were they added to --ignore-rev?
        
         | Arainach wrote:
         | Google doesn't use Git internally, and its code search and
         | source control tools expose the concept of "show blame before
         | this change", so in practice reformats like this aren't
         | troublesome with regard to blame.
        
           | montag wrote:
           | This is true but I think it would still be nice for VCS to
           | have a first class concept of "peek through" changes
           | (whitespace, formatting, etc.) for the purpose of blame.
        
         | tantalor wrote:
         | Blame can be configured to ignore cleanups like this.
        
           | mckn1ght wrote:
           | This has always been something I use to argue against just
           | reformatting code in isolation. TIL git provides a way to
           | ignore that. Thanks!
        
       | ragall wrote:
       | The real lesson here should be that source code shouldn't be
       | stored in a text format, but in a well-defined strict binary
       | format that stores the parse tree directly, which completely
       | eliminates the need for formatters.
        
         | mmastrac wrote:
         | A well-defined, unambiguous formatting standard for a text file
         | is philosophically identically to a well-defined strict binary
         | format. :)
        
       ___________________________________________________________________
       (page generated 2024-06-16 23:00 UTC)