hngopher.com

       [HN Gopher] A monorepo misconception - atomic cross-project commits
       ___________________________________________________________________
        
       A monorepo misconception - atomic cross-project commits
        
       Author : askl
       Score  : 92 points
       Date   : 2021-07-21 07:16 UTC (15 hours ago)
        
 (HTM) web link (www.snellman.net)
 (TXT) w3m dump (www.snellman.net)
        
       | bla3 wrote:
       | The advantage of a monorepo in this particular case is that it
       | makes easy things easy: if you want to remove a parameter of a
       | function in some library and that function has just a few callers
       | in dependent executables, you can just do that in a single
       | commit. Without a monorepo, you have to do the full-blown
       | iterative rollout described in the OP even for small changes, if
       | they cross VCS boundaries.
        
         | draw_down wrote:
         | Exactly, I don't understand what is so difficult to get here.
        
         | lkschubert8 wrote:
         | Wouldn't you still have to do the iterative thing to avoid
         | temporary outages during deployments? Or do you try to
         | synchronize all deployments?
        
           | goodpoint wrote:
           | Most likely bla3 is referring to statically linked
           | applications.
           | 
           | This is the big downside of monorepos: they strongly
           | encourage tight coupling and poor modularity.
        
             | tantalor wrote:
             | > they strongly encourage tight coupling and poor
             | modularity
             | 
             | No, that's not true. Why would you say that?
        
               | majormajor wrote:
               | They make it easy, and then human nature and dev laziness
               | does the rest. If you can reach across the repo and
               | import any random piece of code, you end up with devs
               | doing just that. It's a huge huge pain to try to untangle
               | later.
               | 
               | That's why tools like Bazel are strict about visibility
               | and put more friction and explicitness on those sorts of
               | things. But this tends to not be the first thing at the
               | top of people's minds when starting a new project... so
               | in the monorepos I've worked on, it's never been noticed
               | until it's too late to easily fix.
        
               | bluGill wrote:
               | Because they do nothing to make it hard to add a coupling
               | or break modularity.
               | 
               | You should of course use good discipline to ensure that
               | doesn't happen. Compared to mutli-repo it is a lot easier
               | to violate coupling and modularity and not be detected.
               | Anyone who is using a monorepo needs to be aware of this
               | downside and deal with it. There are other downsides of
               | multi-repo, and those dealing with them need to be aware
               | of those and mitigate them. There is no perfect answer,
               | just compromises.
        
           | sparsely wrote:
           | I don't think synchronized deployments is really possible -
           | you'd have to either still do the iterative thing, or
           | possibly have some versioning system in place
        
             | bluGill wrote:
             | It is possible for trivial cases. What I do in my basement
             | for example - though even there I have come to prefer
             | keeping things intentionally unsynchronized: it ensures
             | that after updates I have some system that still works.
        
           | sam0x17 wrote:
           | If you design with a "no deprecations" mentality and deploy
           | backend before frontend, in _most cases_ this isn't an issue
           | -- the frontend code that needs the new table or column or
           | endpoint that doesn't exist yet won't run until those things
           | are deployed, and the new backend endpoints will be fully
           | backwards compatible with the old frontend, so no issues.
           | 
           | You don't even need to be that dogmatic to make this work
           | either -- simply stipulating backwards compatibility between
           | the two previous deploys should be sufficient.
           | 
           | The better version of this is simply versioning your backend
           | and frontend but I've never been that fancy.
        
           | bpicolo wrote:
           | It takes the guesswork out of library migrations. API
           | migrations still need forwards/backwards compat hygiene,
           | unless you blue/green your entire infrastructure to
           | compatible versions, which is possible but not necessarily
           | practical
        
           | rohitpaulk wrote:
           | If the change is in a shared library (and not a service
           | shared over the network), it's fine to change all usages at
           | once. Deploying a service wouldn't affect the others.
           | 
           | If the change affects the public interface of a service, then
           | there's no option but to make your changes backward-
           | compatible.
        
             | tantalor wrote:
             | > no option but to...
             | 
             | Not necessarily; you can accept downtime/breakage instead.
             | That is always an option!
        
               | bluGill wrote:
               | You must not work on safety critical code where
               | downtime/breakage mean people die.
        
               | marcosdumay wrote:
               | I hope nobody's life depends on the uptime of a web-based
               | distributed system.
               | 
               | But, well, I also expect nobody's life to depend on it.
               | There would be a short window between people getting into
               | that situation and they not have any life to depend on
               | anything.
        
               | ralmeida wrote:
               | Most people don't, to be fair. The case where zero-
               | downtime strategies are adopted for (at least) a
               | questionable ROI is far more common.
        
               | closeparen wrote:
               | I hope you're not doing anything "safety critical" with
               | general purpose computers, let alone microservices.
        
               | bluGill wrote:
               | Well the arm CPUs we use are in general purpose computers
               | as well. Though you are correct, we don't follow the same
               | practices as general purpose computers.
        
               | tantalor wrote:
               | I was thinking more like bank/govt websites that go down
               | for "scheduled maintenance".
        
           | fwip wrote:
           | Depends on how you deploy and your scaling strategy.
           | 
           | e.g: If your smallest deployable unit is a Kubernetes pod,
           | and all your affected applications live in that pod, you can
           | treat it as a private change.
        
         | echelon wrote:
         | _This_ is the reason for monorepos.
         | 
         | It's not about migrating APIs or coordinating deployments.
         | That's an impossible problem to solve with your repo. It's to
         | update _libraries and shared code_ uniformly and _patch
         | dependencies_ (eg. for vulns) all in one go.
         | 
         | Imagine updating Guava 1.0 -> 2.0. Either you require each team
         | to do this independently over the course of several months with
         | no coordination, or in a monorepo, one person can update every
         | single project and service with relative ease.
         | 
         | Let's say there's an npm vuln in leftpad 5.0. You can update
         | everything to leftpad 5.0.1 at once and know that everything
         | has been updated. Then you just tell teams to deploy. (Caveat:
         | this doesn't really work as cleanly for a dynamically typed
         | language like javascript, but it's a world wonder in a language
         | like Java.)
         | 
         | I can't fathom how hard it would be to coordinate all of these
         | changes with polyrepos. You'd have to burden every team with a
         | required change and force them to accommodate. Someone not
         | familiar with the problem has to take time out of their day or
         | week to learn the new context. Then search and apply changes.
         | And there's no auditability or guarantee everyone did it. Some
         | isolated or unknown repos somewhere won't ever see the upgrade.
         | But in a monorepo, you're done in a day.
         | 
         | Now, here's a key win: you're really at an advantage when
         | updating "big things". Like getting all apps on gRPC. Or
         | changing the metrics system wholesale. These would be year long
         | projects in a polyrepo world. With monorepos, they're almost an
         | afterthought.
         | 
         | Monorepos are magical at scale. Until you experience one, it's
         | really hard to see how easy this makes life for big scale
         | problems.
        
           | Denvercoder9 wrote:
           | None of the things you state are related to the technical act
           | of having a single repository, but they are all results of
           | the organizational structure. It's entirely possible to have
           | a monorepo where one person doesn't have the organizational
           | or technical ability to update everything in it, and you can
           | also have split repositories where a single person does.
        
           | bluGill wrote:
           | I update our polyrepo code all the time. I just have to go
           | into each repo and make the change. It isn't much more work
           | than you have, the only difference I need to run more "git
           | pull/git commit/git push" steps, and my CI dashboard draws
           | from more builds.
           | 
           | I sometimes leave some repos at older versions of tools.
           | Sometimes the upgrade is compelling for some parts of our
           | code and of no value to others.
        
           | marcosdumay wrote:
           | Hum... Ok, there has been a "mostly nonbreaking" change on
           | leftpad that corrects some vulnerability. Are you proposing
           | that a single developer/team clones the work of 100s or 1000s
           | of different people, update it into to use the new leftpad,
           | run the tests and push?
           | 
           | The only way this could ever work is if the change is really
           | nonbreaking (do those exist in Javascript?), in what case you
           | could script the update on as many repositories you want too.
           | Otherwise, living with the library vulnerability is probably
           | safer than blindly updating it on code you know nothing
           | about.
           | 
           | Anyway, burdening all the teams with a required change is the
           | way to go. It doesn't matter how you organize your code.
           | Anything else is a recipe for disaster.
        
             | tylerhou wrote:
             | > with the library vulnerability is probably safer than
             | blindly updating it on code you know nothing about.
             | 
             | This is what tests are for.
             | 
             | > Are you proposing that a single developer/team clones the
             | work of 100s or 1000s of different people, update it into
             | to use the new leftpad, run the tests and push? ... Anyway,
             | burdening all the teams with a required change is the way
             | to go.
             | 
             | No, and speaking from personal experience, it's much more
             | difficult to ask ~500 individuals to understand how and why
             | they need to make a change than to have a few people just
             | make the change and send out CLs. Writing a change,
             | especially one that you have to read a document to
             | understand, has a fixed amount of overhead.
             | 
             | (Also, you don't have to clone all the repositories if
             | you're in a monorepo :) ).
        
           | dogleash wrote:
           | As I read your post, you're attributing a lot of properties
           | to a monorepo.
           | 
           | That's fine, but I think you should be careful whether you're
           | pointing to the properties of using a single repsoitory in
           | general; or the properties of tooling certain monorepo-using
           | companies have built with no requirements other than
           | supporting their own source control; or how uniform it can
           | feel to jump into multiple projects when every project has
           | been forced to a lot of the same base tooling beyond just
           | source control; and/or a work culture that happened to have
           | grown up around a certain monorepo - but for which a monorepo
           | is neither necessary nor sufficient to reproduce.
           | 
           | I've worked jobs where the entire company is in a unified
           | repository, and companies where a repository represents
           | everything related to a product family, and places where each
           | product was multiple gitlab groups with tons of projects.
           | 
           | The most I can say is that monorepos solve package management
           | by avoiding package management. The rest comes down to
           | tooling, workflow and culture.
           | 
           | I would be interested in hearing why it would be
           | hypothetically worse if google had gone the other direction.
           | Where they still spent the same amount of money and time from
           | highly talented people on the problem of unifying their
           | tooling and improving workflow, but done it to support a
           | polyrepo environment instead. How would it have been
           | fundamentally worse than what they got when they happened to
           | do the same with a monorepo?
        
           | Groxx wrote:
           | Forcing each team to do (or approve!) the update has nothing
           | to do with a shared repository, it's just what limits and
           | requirements you've added on top of your repo(s). A for loop
           | over N repos and an automated commit in each one is perfectly
           | achievable.
           | 
           | If you want consistency so you can automate stuff, require
           | consistency.
        
           | thelopa wrote:
           | Do you really feel that confident working in another team's
           | code base? I work in a multi-repo company, and almost every
           | time I've gotten a patch from outside my team it's been wrong
           | in some way. Why would I want to make it easier for people
           | who don't understand (and aren't interested in understanding)
           | my project to land code in it?
        
             | z0r wrote:
             | Because any company reaching monorepo scale will have
             | integration tests that cut across the boundaries of your
             | projects. It's possible for an outside contribution to
             | break your corner of the repo, but the flip side is that
             | you will know much more quickly if your own changes break
             | another part of the repo.
        
               | majormajor wrote:
               | >Because any company reaching monorepo scale will have
               | integration tests that cut across the boundaries of your
               | projects.
               | 
               | Heh. This makes a couple assumptions that I only can wish
               | were true: (a) that people won't go to monorepo until
               | they hit some huge scale, and (b) that people will at
               | that point have good test coverage.
        
               | thelopa wrote:
               | I completely disagree. My company is absolutely "monorepo
               | scale", but I also know we're nowhere close to having the
               | test coverage to allow people unfamiliar with a project
               | to freely land changes in it.
        
             | tylerhou wrote:
             | Another advantage of a mono-repo is that it encourages
             | everyone to use the same tooling & coding libraries. So (at
             | least at Google) I can open another team's codebase (as
             | long as it's in the main mono-repo) and understand it
             | within ~10 minutes.
             | 
             | I fix bugs that bother me in random projects (both internal
             | and external) maybe once a month (most recently, in the
             | large scale change tool!). For context, I've been at Google
             | for ~3 years. I've only had a changelist rejected once, and
             | that was because the maintainer disagreed with the
             | technical direction of the change and not the change
             | itself.
        
       | mumblemumble wrote:
       | So I'm kind of stuck thinking, if you want to have cross-cutting
       | changes happening in many small steps rather than one big one,
       | why not choose a repository layout that makes it the path of
       | least resistance, rather than one that makes it require extra
       | effort?
        
         | bluGill wrote:
         | There is no perfect answer. Both mono-repo and multi-repo have
         | pros and cons. Once you make a choice you have to deal with the
         | cons that come with your choice (you are allowed to change, but
         | then you get a different set of cons to deal with).
        
       | Cthulhu_ wrote:
       | The API changes are mainly a problem (I think) if you have a
       | monorepo, but not a mono-deploy; you can only get the full
       | benefits of a monorepo if all your changes are deployed
       | simultaneously and atomically.
       | 
       | Changing a model that is shared between different services (or a
       | client / server) should be atomic. In practice, you need to think
       | about backwards compatibility, and work in the three-step process
       | outlined in the article (build and deprecate, switch over, remove
       | deprecated code across a number of deployments / time).
       | 
       | If you don't have atomic deployments, that's one less argument in
       | favor of monorepos.
        
         | jyounker wrote:
         | You're using one specific definition of an API. Another
         | definition of an API is the contract between a library and the
         | calling code. The latter sort of API changes happen frequently
         | (at least in places with monorepos), and they don't require
         | phased deployments.
        
         | [deleted]
        
       | kerblang wrote:
       | Sort of off-topic, but before building a shared library, I
       | strongly recommend reading up on Jeff Atwood's "Rule of Threes".
       | It's harder than most people think, and I've seen them do far
       | more harm than good because authors aren't disciplined & lack
       | domain expertise. The biggest mistake is thinking "Something is
       | always better than nothing" when in fact that something can
       | easily become your biggest liability. If you must do it, keep it
       | as minimalist and focused as possible, and don't hand the task to
       | junior-level people; hand it to someone who recognizes that it's
       | 10x harder than it looks.
       | 
       | https://blog.codinghorror.com/the-delusion-of-reuse
       | 
       | https://blog.codinghorror.com/rule-of-three/
        
       | marceloabsousa wrote:
       | Mostly agree with the article. To me the problem is about
       | dependency management. I see all the time codebases hugely
       | fragmented at the level of git which is totally ad-hoc. After a
       | while, teams face a lot of issues the most annoying being one
       | change in the product involving N PRs with N code reviews and N
       | CIs. This fragmentation of knowledge also pop-ups in weird
       | integration bugs that could be solved with better tool enforcing
       | clearer processes.
       | 
       | The ultimate goal of Reviewpad (https://reviewpad.com) is to
       | allow the benefits of the monorepo approach independently of how
       | the codebase is fragmented at the git level. We are starting from
       | the perspective of code review and debugging (e.g. code reviews
       | with multiple PRs, or code reviews across multiple projects). For
       | people doing microservices like us, the ability to have a single
       | code review for both library and clients has been quite positive
       | so far.
        
       | nhoughto wrote:
       | Don't agree with the conclusion or reasoning, sure you might need
       | to take a multi Stage approach for many reasons. But if you've
       | got your shit together atomic change is possible, in many other
       | approaches it's not even nearly possible. Whether your org can
       | put together the maturity / investment / whatever to make it a
       | reality is upto you. But the possibility is the sell, not the
       | guarantee. If you can make it work (and google do on a crazy
       | scale) it's an incredible power, incredible, like change your
       | entire approach to tech debt incredible. Also has costs at scale
       | like the theory about how google keeps shutting down products
       | because keeping them in tree is expensive.
        
         | jyounker wrote:
         | Google keeps shutting down projects because of internal
         | politics and reward structures.
        
       | shawnz wrote:
       | Single "big bang" atomic commits that update both the client and
       | server with new features usually aren't practical, I can agree
       | with that.
       | 
       | I think the real intention of the "atomic commits" idea is a
       | total ordering of commits between both the client and server
       | code. Both a "big bang" strategy as well as the author's
       | incremental change strategy can benefit from that arrangement.
       | 
       | The key is that at any given point in the repository's history,
       | you can be sure that the client and server both work together.
       | The point is not that each commit atomically completes an entire
       | new feature, only that each commit atomically moves the
       | repository to a state where the client and server still work
       | together.
       | 
       | In that sense, the author's incremental commits actually do have
       | that kind of atomicity.
        
       | joatmon-snoo wrote:
       | ITT: people who have never dealt with things going _wrong_ in a
       | large codebase.
        
       | xcambar wrote:
       | The process described in the article (allow old+new behavior on
       | server, migrate clients, deprecate old behavior on server) works
       | spectacularly well, especially when used with a decent level of
       | observability and informafion in the form of logging messages.
       | 
       | We are using variations on this theme extensively at my company,
       | in a large spectrum of projects, with great satisfaction.
       | 
       | Crucially, this method is orthogonal to using monorepos. It is
       | simply a safety net for and good stewardship of your APIs.
       | 
       | Definitely a best practice in my tool belt.
        
         | kohlerm wrote:
         | That is just a best practice for published interfaces (allowed
         | to be used outside of your repositories). E.G. using a
         | different build, main example being APIs available for
         | customers) Not all interfaces need (really "must not") to be
         | published. For "private" interfaces it is an advantage to be
         | able to refactor them easily and also being able to rollback
         | those changes easily if needed. Without a mono-repo that
         | becomes difficult. You basically need an additional layer to
         | "stitch" commits together. You could question why that makes
         | sense because it would just emulate mono-repo behavior.
        
           | bluGill wrote:
           | If an interface spans repos it isn't private.
           | 
           | I assume you are not creating some strawman every source code
           | file is in a separate repository. That would be insane and
           | nobody does that. If you are in a mutli-repo world, then how
           | your break your repos up is an important decision. There are
           | interfaces that are allowed to use within one repo that you
           | cannot use in others, which allows things that should be
           | coupled to be coupled, while forcing things that should be
           | more isolated to be isolated. This is the power of the multi-
           | repo: the separation of what is private vs public is
           | enforced. (which isn't to say you should go to multi-repo -
           | there are pros and cons of both approaches, in the area of
           | where and interface can be used multi-repo gives you more
           | control, but there are other ways to achieve the same goal)
        
       | KaiserPro wrote:
       | Monorepos offer one thing, and precisely one advantage,
       | "visibility"
       | 
       | because everything is in the same place, you can, in theory can
       | find stuff.
       | 
       | However in practice its also a great way to hide things.
       | 
       | But the major issue is that people confuse monorepos for a
       | release system. Monorepos do not replace the need for an artifact
       | cache/store, or indeed versioned libraries. They also don't
       | dictate that you can't have them, it just makes it easier _not
       | to_.
       | 
       | You can do what facebook do, which is essentially have an
       | unknowable npm like dependency graph and just yolo it. They sorta
       | make it work by having lots of unit tests. However that only
       | works on systems that have a continuous update path (ie not user
       | hardware.)
       | 
       | It is possible to have "atomic changes" across many libraries. it
       | makes it easier to regex stuff, but also impossible to test or
       | predict what would happen. Its very rare that you'd want to alter
       | >0.1% of the files in your monorepo at one time. But thats not
       | the fault of a monorepo, thats a product of having millions of
       | files.
        
       | lhnz wrote:
       | > It's particularly easy to see that the "atomic changes       >
       | across the whole repo" story is rubbish when you move       >
       | away from libraries, and also consider code that has        > any
       | kind of more complicated deployment lifecycle,        > for
       | example the interactions between services and        > client
       | binaries that communicate over an RPC interface.
       | 
       | This seems exactly wrong to me. Getting rid of complicated
       | deployment lifecycles is exactly the job that people use
       | monorepos to solve. Wasn't that one of the reasons they are used
       | at Google, Facebook, etc? As well as being able to do large-scale
       | refactors in one place, of course.
       | 
       | You should be able to merge a PR to cause a whole system to
       | deploy: clients, backend services, database schemas, libraries,
       | etc. This doesn't preclude wanting to break up commits into
       | meaningful chunks or to add non-breaking-change migration
       | patterns into libraries -- but consider this: is it meaningful
       | for a change to be broken into separate commits just because it
       | is being done to independent services? What benefit does cutting
       | commits this way give you?
       | 
       | What you want to avoid is needing to do separate PRs into many
       | backend service and client repos, since: (1) when the review is
       | split into 10+ places it's easier for reviewers to miss problems
       | arising due to integration, (2) needing to avoid breaking changes
       | can sometimes require developers to follow multi-stage upgrade
       | processes that are so difficult that they cause mistakes, and (3)
       | when there are separate PRs into different repositories these
       | tend to start independent CI processes that will not test the
       | system as-it-will-be (unless you test in production or have a
       | very good E2E suite -- which would be a good idea in this
       | situation).
       | 
       | I will say that, even in a monorepo, a big change might still
       | happen gradually behind feature flags. But I think that generally
       | it's nice to be able to deploy breaking changes in a more atomic
       | fashion.
        
         | jmillikin wrote:
         | I can only attest to how Google does (did) it, but they use the
         | monorepo as a serialized consistent history. There is no
         | concept of deploying "the whole system" -- even deploying a
         | single service requires, mechanically, multiple commits spread
         | across time.
         | 
         | In fact, when I was last there in 2017, making a backwards-
         | incompatible atomic change to multiple unrelated areas of the
         | codebase was forbidden by policy and technical controls (the
         | "components" system). You _had_ to chop that thing up, and wait
         | a day or two for the lower-level parts of it to work their way
         | through to HEAD.
         | 
         | I would generalize this to say that the idea of deploying
         | clients, schemas, backends, etc all at once is an inherently
         | "small scale" approach.
        
           | lhnz wrote:
           | Interesting, RE: Google. Though, even if no other large-scale
           | company is doing this, it seems on face value to be an easier
           | and safer way to develop software up to perhaps a medium
           | scale system/problem (and assuming that you're not needing to
           | make database changes). I've yet to see a benefit to
           | straddling changes across multiple repositories...
        
             | andyferris wrote:
             | The benefit of changes straddling repositories is having
             | separation of control.
             | 
             | For example different npm (cargo, etc) packages are
             | controlled by different entities. Semver is used to
             | (loosely) account for compatibility issues and allow for
             | rolling updates.
             | 
             | A single company requiring multiple repositories for
             | control reasons might be an antipattern and might indicate
             | issues with alignment/etc.
        
               | lhnz wrote:
               | > The benefit of changes straddling repositories        >
               | is having separation of control.
               | 
               | Good point, although a monorepo with a `CODEOWNERS` file
               | could be used to give control of different areas of a
               | codebase to different people/teams.
        
               | jyounker wrote:
               | Monorepo tooling (as opposed to a big repo with a bunch
               | of stuff tossed into it) generally provide access
               | controls.
               | 
               | Also remember that non DVCS repos generally have find-
               | grained access controls.
        
         | jsnell wrote:
         | > This seems exactly wrong to me. Getting rid of complicated
         | deployment lifecycles is exactly the job that people use
         | monorepos to solve. Wasn't that one of the reasons they are
         | used at Google, Facebook, etc? As well as being able to do
         | large-scale refactors in one place, of course.
         | 
         | No, neither of those is why big companies use monorepos.
         | Clearly the kinds of things you wrote are why the general
         | public _thinks_ big companies use monorepos, which is why this
         | argument keeps popping up. But given making atomic changes to
         | tens, hundreds, or thousands of projects does not actually
         | match the normal workflows used by those companies, it cannot
         | be the real reason.
         | 
         | Monorepos are nice due to trunk based development, and a shared
         | view of the current code base. Not due to the capability of
         | making cross-cutting changes in one go.
        
           | kohlerm wrote:
           | You do not need Monorepos for trunk based development. In
           | principle you "only" need a common build system. E.g. You can
           | just force the consumption of head via the build system. I
           | think the rollback is still a factor. E.g. in a Monorepo
           | rolling back a change is straightforward, whereas with
           | multiple repos you would have to somehow track that commits
           | from several repos belong together. This would create an
           | additional layer which is avoided by using a monorepo
        
           | ec109685 wrote:
           | Without a mono repo and the ability to build from HEAD all
           | the components, it's much harder to be sure a change to a
           | library _is_ actually backwards compatible (think a
           | complicated refactoring).
           | 
           | Otherwise, there is much more fear that a change to an
           | important library will have downstream impact and when the
           | impact does arise, you've moved on from the change that
           | caused it.
        
         | nicative wrote:
         | Even within the same repo, it is very likely that the old
         | version of your code will coexist with the new version during
         | the deploy roll out. Often having different commits and deploys
         | is a requirement. For instance, imagine that you add a column
         | in the database and also is using it in the backend service.
         | You probably have add the column first and then commit and
         | deploy the usage of the column later, because you can't easily
         | guarantee that the new code won't be used before the column is
         | added. Same would apply for a new field in the API contract.
        
           | magicalhippo wrote:
           | We commit the column schema change and the code that uses it
           | in a single commit.
           | 
           | This is handled by our deployment tool. It won't allow the
           | new executable to run until the database has been updated to
           | the new schema.
        
           | thundergolfer wrote:
           | The old version won't coexist when the change is contained
           | within a single binary, which seems like it would be true in
           | a bunch of cases.
           | 
           | In our monorepo we have to treat database changes with care,
           | like you mention, as well as HTTP client/server API changes,
           | but a bunch of stuff can be refactored cleanly without
           | concern for backwards compatibility.
        
             | rutthenut wrote:
             | That is only likely to apply in very small-scale
             | environments or companies.
             | 
             | And if only a single binary is produced, quite likely a
             | single source code repo would be used as well - sounds like
             | 'single developer mode', well 'small team' at most.
        
             | withinboredom wrote:
             | Do you only have a single instance of the binary running
             | across the whole org? And during deployment to you stop the
             | running instance before starting the new one?
        
               | jvolkman wrote:
               | Any change that won't cross deployable binary boundaries
               | (think docker container) can be made atomically without
               | care about subsequent deployment schedules. So this
               | doesn't work for DB changes or client/server API changes
               | as mentioned by OP, but does work for changes to shared
               | libraries that get packaged into the deployment
               | artifacts. For example, changing the interface in an
               | internal shared library, or updating the version of a
               | shared external library.
        
               | kohlerm wrote:
               | Seems like a common misconception to me that people seem
               | to believe that you can never change an interface. You
               | actually can as long as it is not published to be used
               | outside of your repositories.
        
         | tsss wrote:
         | It's simply unfeasible to do that. The best you can do is blue-
         | green deployments with a monorepo but you will still need at
         | least data backwards compatibility to be able to roll back any
         | change. The only thing you gain with blue-green deployments is
         | slightly easier API evolution.
        
           | taeric wrote:
           | Nit: you need backward compatibility to roll out the new
           | change. You need forward compatibility to roll it back.
           | 
           | Right?
        
             | tsss wrote:
             | Yes, but usually you need both.
        
               | taeric wrote:
               | Oh, agreed. My point was it was easy for folks to think
               | they are safe because they have backwards compatibility,
               | when they actually need both if they are concerned with
               | running a rollback.
        
       | kohlerm wrote:
       | I do not really agree with the conclusion. Google does large
       | scale automatic refactorings. Those clearly benefit from atomic
       | commits, because it is easy to roll them back in that case. As
       | other have mentioned in smaller projects you might want to be
       | able to do some (internal) refactorings easily and being able to
       | roll them back easily is a big advantage
        
         | tylerhou wrote:
         | As someone who has worked on large-scale refactorings at Google
         | they usually do happen as the author describes:
         | 
         | 1. Add the new interface (& deprecate the old one)
         | 
         | 2. Migrate callers over (this can take a long time)
         | 
         | 3. Remove the old interface.
         | 
         | Even then, you risk breakages because in some cases the
         | deprecation relies on creating a list of approved existing
         | callers, and new callers might be checked in while you're
         | generating that list. (In that case you would ask the new
         | callers to fix-forward by adding themselves to the list.)
         | 
         | This three step process has to happen because step 2 takes a
         | long time for widely-used interfaces, and automatic
         | refactorings cannot handle all cases.
         | 
         | The only time you can consolidate all three into one commit is
         | if the refactoring is so trivial that automatic tooling can
         | handle every case (in which case, does the cost of code churn &
         | code review really justify the change?) or the number of usages
         | are small enough that a person can manually make all the
         | changes the automatic fixer can't handle before much churn
         | happens.
        
           | vlovich123 wrote:
           | Hmm... I didn't work on large scale refactors but (IIRC -
           | it's been a few years) I definitely had to approve commits
           | that were touching components I had ownership over & that was
           | part of the same commit as many other changes batched. Given
           | that I recall a wiki describing what your responsibility was
           | as the downstream user of the API in these scenarios, it
           | seemed like a common thing to me & I recall there were also
           | automated tools to help do this.
           | 
           | Whether or not that's the most common workflow is another
           | story. Works great at the C++/Java API layer or trivial
           | reorganizations, may not work as well when modifying runtime
           | behavior since you have to do the 3 phase commit anyway.
        
         | skybrian wrote:
         | I'm not sure what you consider the conclusion to be, but large-
         | scale refactorings at Google do happen in the way described in
         | the article, with the bulk of the migration done using many
         | commits done in parallel.
         | 
         | Being able to rollback is indeed important and reducing the
         | sizes of commits makes it less painful. If one project has a
         | problem, unrelated projects won't see the churn.
         | 
         | Or at least that's how it was done when I was there. It's been
         | a while.
        
           | joatmon-snoo wrote:
           | It still is done that way, but even individual LSC (large
           | scale change) CLs touch easily >100 files and rely very
           | strongly on the atomicity guarantee of the monorepo.
           | 
           | Plus, when most people think LSCs, they think about the kind
           | of stuff that the C++ or TS or <insert lang> team do, not
           | someone refactoring some library used by a handful of teams,
           | which themselves usually impact anywhere between ten to a few
           | hundred files.
        
           | kohlerm wrote:
           | I am saying large-scale refactoring can and should (in same
           | cases) happen (and are doable). The author claims you should
           | always do it in steps. I disagree with this opinion. If the
           | refactoring is automatic, why would I do it in steps?
        
             | bluGill wrote:
             | Is your automatic refactoring completely 100% bug free in
             | all edge conditions?
             | 
             | I personally don't have that much confidence.
        
             | skybrian wrote:
             | The more projects you touch, the more approvals you need
             | and the more tests you need to run. Some tests take a long
             | time to run and some are flaky. And the longer to takes to
             | do all this, the more likely there is to be a merge
             | conflict due to other people's commits.
             | 
             | If you divide it up, most commits will land quickly and
             | without issue, and then you can deal with the rest.
        
       | sam0x17 wrote:
       | The article seems to be written from a perspective where
       | integration and unit testing don't exist. IMO the main advantage
       | of monorepos is that you can have end-to-end tests within the
       | same project structure. In that case you wouldn't need to do
       | these incremental changes hoping things don't break in
       | production, because your test suite would prove to you that it
       | works.
        
         | ec109685 wrote:
         | And even if you do the changes incrementally, it lets you
         | validate you are doing each step correctly and not breaking
         | backwards compatibility along the way.
        
       | corpMaverick wrote:
       | On the subject of libraries on large code bases. I think one
       | should be careful on deciding what goes into a library and what
       | goes into an API/Service. Both allow you to share code. But there
       | is a difference on who controls the deployment in prod. For
       | example if you need to change a business rule, you can change the
       | code, and when the API/service is deployed the rule change takes
       | effect. However in a library you make the change and deploy the
       | library but it is up to the apps that include the library to
       | decide when the change is deployed in prod. As the owner of the
       | library, you no longer have control on when it is deployed in
       | prod.
        
       | majikandy wrote:
       | Interesting idea and it certainly makes sense to ensure working
       | software at all times.
       | 
       | However it doesn't have to be one or the other.
       | 
       | Sometimes a single commit (or PR if you prefer multiple commits)
       | can update the api and _all_ clients at the same time.
       | 
       | Sometimes client callers are external to the repo/company. In
       | which case backward compatibility strategy is needed.
       | 
       | There is no need to abandon a concept of a single repo just
       | because you might not use one of the main benefits all the time.
        
       | argonaut wrote:
       | There are plenty of smaller changes that are just way easier with
       | a single commit. A small API call with just two or three
       | consumers, a documentation change, etc. The multistage approach
       | is certainly good for big changes.
        
         | stkdump wrote:
         | Also, if you do all changes at once, you can also run the whole
         | testsuite and find problems quicker. Sure, you could still
         | later split the work up into many commits.
         | 
         | But even when problems come up that you didn't see in the
         | tests, it is easier to revert the work if it indeed is a single
         | commit.
         | 
         | The real reason why I still sometimes prefer many small commits
         | is to reduce the chance of merge conflicts.
        
           | kohlerm wrote:
           | You can do automatic bisecting https://git-scm.com/docs/git-
           | bisect which you cannot easily do with multiple repos
        
       | cjfd wrote:
       | The arguments seem exceedingly weak here. Having a monorepo does
       | not prevent a workflow like (1) add new function signature; (2)
       | let all calls use the new function signature; (3) remove old
       | function signature. However, in some cases a multirepo makes sure
       | you could only do it this way. If a function is called all over
       | the place the three-step program is the only somewhat safe way to
       | do it but if a function is called not exceedingly many times one
       | can do the atomic refactor, which is an easier process. That one
       | sometimes needs the three-step process and sometimes can do an
       | atomic refactor is in no way contradictory as this article seems
       | to claim. It depends on the size of the change and the risk
       | involve what you would do in a particular case.
       | 
       | Also, a monorepo forcing trunk-based development is great. Long-
       | living feature branches are hell. I would even say to avoid
       | feature branches entirely and use feature switches instead
       | whenever one can get away with it. Every branch always runs the
       | risk of making refactoring more difficult.
        
         | Cthulhu_ wrote:
         | Re: branches, I agree in principle - we've had feature branches
         | at my current job that were open for a year, reasoning being
         | that it impacts a lot of our application core and we cannot
         | permit issues impacting existing customers, etc etc.
         | 
         | But I do like short-lived (max 1 sprint) branches for my own
         | smaller scale projects because they group changes together. I
         | name my branches after an issue number, rebase freely on that
         | branch, and merge with a `--no-ff` flag when I'm done. My
         | history is a neat branch / work / merge construct.
         | 
         | Not sure if this is just fear, but I believe trunk- and feature
         | switch based history will end up an incoherent mess of multiple
         | projects being worked on at the same time, commits passing
         | through each other. I'm sure they can be filtered out, but
         | still.
        
         | kohlerm wrote:
         | It is not the monorepo which forces trunkbased development it
         | is the fact that you use one build for the monorepo. E.g. you
         | can do trunk based development with multiple repos by forcing
         | everyone to use a common build. You could also use a monorepo
         | and still everyone does their own build consuming libraries via
         | mechanisms such as Mavens Repositories or published npm
         | packages
        
       | thundergolfer wrote:
       | > The example of renaming a function with thousands of callers,
       | for example, is probably better handled by just temporarily
       | aliasing the function, or by temporarily defining the new
       | function in terms of the old.
       | 
       | Why exactly? The best I can think of is that you may annoy people
       | because they'd need to rebase/merge after your change lands and
       | takes away a function they were using.
       | 
       | If you're literally just doing a rename, and you're using
       | something like Java, why not just go ahead and do a global
       | rename?
        
         | jsnell wrote:
         | Because actually getting that commit pushed won't be just
         | clicking "rename method". You'll need to run and pass the
         | presubmits of every single project you made code changes in -
         | and the more projects you're changing at once, the less likely
         | it is that the build is green for all of them at the same time.
         | Then you'll need to get code review approvals from the code
         | owners of each of the clients in one go. Hopefully no new users
         | pop up during this process. If some do, you'll need to iterate
         | again.
         | 
         | Then once some trivial refactoring inevitably causes some kind
         | of breakage ("oh, somebody was using reflection on this
         | class"), you'll need to revert the change. That'll be another
         | set of full code reviews from every single owner. Let's hope
         | that in the meanwhile, nobody pushed any commits that depend on
         | your new code.
         | 
         | None of this is a problem if you have a library and two
         | clients. But the story being told is not "we can safely make
         | changes in three projects at once", it's "we can safely make
         | changes in hundreds or thousands of projects". The former is
         | kind of uninteresting. The latter is a fairy tale.
        
           | jvolkman wrote:
           | The latter does happen at Google at least. But it requires
           | tooling to reliably make the change (monorepo-aware
           | refactoring tools, basically), and tooling to run all
           | affected tests. Such large changes at Google are often
           | accompanied by a doc outlining the specific process and
           | possible impacts of the change, and many times are given
           | "global approval" by someone that has that ability rather
           | than requiring individual approvals from each affected team.
        
             | jsnell wrote:
             | That's not true. The process for the large scale changes at
             | Google is described in the "Sofware Engineering at Google"
             | book[0]. Chapter 22 is all about it. There is tooling, yes,
             | but the goal is exactly the opposite of trying to make a
             | single commit across the whole codebase:
             | 
             | > At Google, we've long ago abandoned the idea of making
             | sweeping changes across our codebase in these types of
             | large atomic changes.
             | 
             | [0] https://abseil.io/resources/swe_at_google.2.pdf
        
         | squiggleblaz wrote:
         | The article is talking about mitigating risk. Unless I missed
         | something, it doesn't restrict itself to Java, so that's an
         | unreasonable restriction when questioning the author's
         | reasoning. But I don't think it destroys the argument
         | completely.
         | 
         | I suppose a single-developer code base in a fully checked and
         | compiled language that doesn't support any kind of reflection
         | has no particular added risk from renaming vs introducing a new
         | name. Each time you remove one of those constraints, you add a
         | little bit of risk.
         | 
         | If you have a giant company, it might be possible that someone
         | is copying a jar file and calling it in a weird way that you
         | don't expect.
         | 
         | If your language isn't fully checked, you might correctly
         | rename all the Typescript uses and miss a Javascript use.
         | 
         | If the language supports dynamical calling, it might turn out
         | that somewhere it says "if the value of the string is one of
         | these string values, call the method whose name is equal to the
         | value of the string". There's various IPC systems that work
         | this way, and it will certainly be hard to atomically upgrade
         | them. I hate that kind of code but someone else doesn't.
         | 
         | If your language supports you doing these things, you can
         | create as many conventions as you like to eliminate it. But
         | someone will have an emergency and they need to fix it right
         | now.
         | 
         | Some people view the correct way of dealing with that problem
         | is to insist on the development conventions, because we need to
         | have some kind of conventions for a large team to feasibly work
         | together.
         | 
         | But I guess the author leans towards the side that says "if
         | it's valid according to the language/coding environment, it
         | might be better or worse, but it's still valid and we need to
         | expect and accommodate it". It isn't my preference but it's a
         | viable position - technical debt is just value if you can
         | accommodate it without some unreasonable burden.
        
         | magicalhippo wrote:
         | > If you're literally just doing a rename, and you're using
         | something like Java, why not just go ahead and do a global
         | rename?
         | 
         | It introduces changes to places which really doesn't need
         | changes. We've done both at work, but I mostly prefer just
         | making the old function(s) simply call the new one directly.
         | 
         | Then you won't "pollute" source control annotation (blame) and
         | similar.
         | 
         | Not a 100% thing though.
        
       | bob1029 wrote:
       | It looks like more effective change control is a major concern of
       | the author.
       | 
       | We use a monorepo for our organization and have found that
       | feature flags are the best way to manage the problem of exposing
       | new functionality in ways that won't piss off our customers. We
       | can let them tell us when they are ready and we flip the switch.
       | 
       | Once a flag goes true for every customer, make a note to drop it.
       | This is important because these things will accumulate if you
       | embrace this ideology.
        
       | draw_down wrote:
       | This is still easier to do with monorepo. One reason is easily,
       | _reliably_ finding all the usage sites.
        
       | w_t_payne wrote:
       | Either way, you've got to do the tooling work to make your chosen
       | approach work. There's no free lunch here.
        
       ___________________________________________________________________
       (page generated 2021-07-21 23:01 UTC)