[HN Gopher] A monorepo misconception - atomic cross-project commits
___________________________________________________________________
A monorepo misconception - atomic cross-project commits
Author : askl
Score : 92 points
Date : 2021-07-21 07:16 UTC (15 hours ago)
(HTM) web link (www.snellman.net)
(TXT) w3m dump (www.snellman.net)
| bla3 wrote:
| The advantage of a monorepo in this particular case is that it
| makes easy things easy: if you want to remove a parameter of a
| function in some library and that function has just a few callers
| in dependent executables, you can just do that in a single
| commit. Without a monorepo, you have to do the full-blown
| iterative rollout described in the OP even for small changes, if
| they cross VCS boundaries.
| draw_down wrote:
| Exactly, I don't understand what is so difficult to get here.
| lkschubert8 wrote:
| Wouldn't you still have to do the iterative thing to avoid
| temporary outages during deployments? Or do you try to
| synchronize all deployments?
| goodpoint wrote:
| Most likely bla3 is referring to statically linked
| applications.
|
| This is the big downside of monorepos: they strongly
| encourage tight coupling and poor modularity.
| tantalor wrote:
| > they strongly encourage tight coupling and poor
| modularity
|
| No, that's not true. Why would you say that?
| majormajor wrote:
| They make it easy, and then human nature and dev laziness
| does the rest. If you can reach across the repo and
| import any random piece of code, you end up with devs
| doing just that. It's a huge huge pain to try to untangle
| later.
|
| That's why tools like Bazel are strict about visibility
| and put more friction and explicitness on those sorts of
| things. But this tends to not be the first thing at the
| top of people's minds when starting a new project... so
| in the monorepos I've worked on, it's never been noticed
| until it's too late to easily fix.
| bluGill wrote:
| Because they do nothing to make it hard to add a coupling
| or break modularity.
|
| You should of course use good discipline to ensure that
| doesn't happen. Compared to mutli-repo it is a lot easier
| to violate coupling and modularity and not be detected.
| Anyone who is using a monorepo needs to be aware of this
| downside and deal with it. There are other downsides of
| multi-repo, and those dealing with them need to be aware
| of those and mitigate them. There is no perfect answer,
| just compromises.
| sparsely wrote:
| I don't think synchronized deployments is really possible -
| you'd have to either still do the iterative thing, or
| possibly have some versioning system in place
| bluGill wrote:
| It is possible for trivial cases. What I do in my basement
| for example - though even there I have come to prefer
| keeping things intentionally unsynchronized: it ensures
| that after updates I have some system that still works.
| sam0x17 wrote:
| If you design with a "no deprecations" mentality and deploy
| backend before frontend, in _most cases_ this isn't an issue
| -- the frontend code that needs the new table or column or
| endpoint that doesn't exist yet won't run until those things
| are deployed, and the new backend endpoints will be fully
| backwards compatible with the old frontend, so no issues.
|
| You don't even need to be that dogmatic to make this work
| either -- simply stipulating backwards compatibility between
| the two previous deploys should be sufficient.
|
| The better version of this is simply versioning your backend
| and frontend but I've never been that fancy.
| bpicolo wrote:
| It takes the guesswork out of library migrations. API
| migrations still need forwards/backwards compat hygiene,
| unless you blue/green your entire infrastructure to
| compatible versions, which is possible but not necessarily
| practical
| rohitpaulk wrote:
| If the change is in a shared library (and not a service
| shared over the network), it's fine to change all usages at
| once. Deploying a service wouldn't affect the others.
|
| If the change affects the public interface of a service, then
| there's no option but to make your changes backward-
| compatible.
| tantalor wrote:
| > no option but to...
|
| Not necessarily; you can accept downtime/breakage instead.
| That is always an option!
| bluGill wrote:
| You must not work on safety critical code where
| downtime/breakage mean people die.
| marcosdumay wrote:
| I hope nobody's life depends on the uptime of a web-based
| distributed system.
|
| But, well, I also expect nobody's life to depend on it.
| There would be a short window between people getting into
| that situation and they not have any life to depend on
| anything.
| ralmeida wrote:
| Most people don't, to be fair. The case where zero-
| downtime strategies are adopted for (at least) a
| questionable ROI is far more common.
| closeparen wrote:
| I hope you're not doing anything "safety critical" with
| general purpose computers, let alone microservices.
| bluGill wrote:
| Well the arm CPUs we use are in general purpose computers
| as well. Though you are correct, we don't follow the same
| practices as general purpose computers.
| tantalor wrote:
| I was thinking more like bank/govt websites that go down
| for "scheduled maintenance".
| fwip wrote:
| Depends on how you deploy and your scaling strategy.
|
| e.g: If your smallest deployable unit is a Kubernetes pod,
| and all your affected applications live in that pod, you can
| treat it as a private change.
| echelon wrote:
| _This_ is the reason for monorepos.
|
| It's not about migrating APIs or coordinating deployments.
| That's an impossible problem to solve with your repo. It's to
| update _libraries and shared code_ uniformly and _patch
| dependencies_ (eg. for vulns) all in one go.
|
| Imagine updating Guava 1.0 -> 2.0. Either you require each team
| to do this independently over the course of several months with
| no coordination, or in a monorepo, one person can update every
| single project and service with relative ease.
|
| Let's say there's an npm vuln in leftpad 5.0. You can update
| everything to leftpad 5.0.1 at once and know that everything
| has been updated. Then you just tell teams to deploy. (Caveat:
| this doesn't really work as cleanly for a dynamically typed
| language like javascript, but it's a world wonder in a language
| like Java.)
|
| I can't fathom how hard it would be to coordinate all of these
| changes with polyrepos. You'd have to burden every team with a
| required change and force them to accommodate. Someone not
| familiar with the problem has to take time out of their day or
| week to learn the new context. Then search and apply changes.
| And there's no auditability or guarantee everyone did it. Some
| isolated or unknown repos somewhere won't ever see the upgrade.
| But in a monorepo, you're done in a day.
|
| Now, here's a key win: you're really at an advantage when
| updating "big things". Like getting all apps on gRPC. Or
| changing the metrics system wholesale. These would be year long
| projects in a polyrepo world. With monorepos, they're almost an
| afterthought.
|
| Monorepos are magical at scale. Until you experience one, it's
| really hard to see how easy this makes life for big scale
| problems.
| Denvercoder9 wrote:
| None of the things you state are related to the technical act
| of having a single repository, but they are all results of
| the organizational structure. It's entirely possible to have
| a monorepo where one person doesn't have the organizational
| or technical ability to update everything in it, and you can
| also have split repositories where a single person does.
| bluGill wrote:
| I update our polyrepo code all the time. I just have to go
| into each repo and make the change. It isn't much more work
| than you have, the only difference I need to run more "git
| pull/git commit/git push" steps, and my CI dashboard draws
| from more builds.
|
| I sometimes leave some repos at older versions of tools.
| Sometimes the upgrade is compelling for some parts of our
| code and of no value to others.
| marcosdumay wrote:
| Hum... Ok, there has been a "mostly nonbreaking" change on
| leftpad that corrects some vulnerability. Are you proposing
| that a single developer/team clones the work of 100s or 1000s
| of different people, update it into to use the new leftpad,
| run the tests and push?
|
| The only way this could ever work is if the change is really
| nonbreaking (do those exist in Javascript?), in what case you
| could script the update on as many repositories you want too.
| Otherwise, living with the library vulnerability is probably
| safer than blindly updating it on code you know nothing
| about.
|
| Anyway, burdening all the teams with a required change is the
| way to go. It doesn't matter how you organize your code.
| Anything else is a recipe for disaster.
| tylerhou wrote:
| > with the library vulnerability is probably safer than
| blindly updating it on code you know nothing about.
|
| This is what tests are for.
|
| > Are you proposing that a single developer/team clones the
| work of 100s or 1000s of different people, update it into
| to use the new leftpad, run the tests and push? ... Anyway,
| burdening all the teams with a required change is the way
| to go.
|
| No, and speaking from personal experience, it's much more
| difficult to ask ~500 individuals to understand how and why
| they need to make a change than to have a few people just
| make the change and send out CLs. Writing a change,
| especially one that you have to read a document to
| understand, has a fixed amount of overhead.
|
| (Also, you don't have to clone all the repositories if
| you're in a monorepo :) ).
| dogleash wrote:
| As I read your post, you're attributing a lot of properties
| to a monorepo.
|
| That's fine, but I think you should be careful whether you're
| pointing to the properties of using a single repsoitory in
| general; or the properties of tooling certain monorepo-using
| companies have built with no requirements other than
| supporting their own source control; or how uniform it can
| feel to jump into multiple projects when every project has
| been forced to a lot of the same base tooling beyond just
| source control; and/or a work culture that happened to have
| grown up around a certain monorepo - but for which a monorepo
| is neither necessary nor sufficient to reproduce.
|
| I've worked jobs where the entire company is in a unified
| repository, and companies where a repository represents
| everything related to a product family, and places where each
| product was multiple gitlab groups with tons of projects.
|
| The most I can say is that monorepos solve package management
| by avoiding package management. The rest comes down to
| tooling, workflow and culture.
|
| I would be interested in hearing why it would be
| hypothetically worse if google had gone the other direction.
| Where they still spent the same amount of money and time from
| highly talented people on the problem of unifying their
| tooling and improving workflow, but done it to support a
| polyrepo environment instead. How would it have been
| fundamentally worse than what they got when they happened to
| do the same with a monorepo?
| Groxx wrote:
| Forcing each team to do (or approve!) the update has nothing
| to do with a shared repository, it's just what limits and
| requirements you've added on top of your repo(s). A for loop
| over N repos and an automated commit in each one is perfectly
| achievable.
|
| If you want consistency so you can automate stuff, require
| consistency.
| thelopa wrote:
| Do you really feel that confident working in another team's
| code base? I work in a multi-repo company, and almost every
| time I've gotten a patch from outside my team it's been wrong
| in some way. Why would I want to make it easier for people
| who don't understand (and aren't interested in understanding)
| my project to land code in it?
| z0r wrote:
| Because any company reaching monorepo scale will have
| integration tests that cut across the boundaries of your
| projects. It's possible for an outside contribution to
| break your corner of the repo, but the flip side is that
| you will know much more quickly if your own changes break
| another part of the repo.
| majormajor wrote:
| >Because any company reaching monorepo scale will have
| integration tests that cut across the boundaries of your
| projects.
|
| Heh. This makes a couple assumptions that I only can wish
| were true: (a) that people won't go to monorepo until
| they hit some huge scale, and (b) that people will at
| that point have good test coverage.
| thelopa wrote:
| I completely disagree. My company is absolutely "monorepo
| scale", but I also know we're nowhere close to having the
| test coverage to allow people unfamiliar with a project
| to freely land changes in it.
| tylerhou wrote:
| Another advantage of a mono-repo is that it encourages
| everyone to use the same tooling & coding libraries. So (at
| least at Google) I can open another team's codebase (as
| long as it's in the main mono-repo) and understand it
| within ~10 minutes.
|
| I fix bugs that bother me in random projects (both internal
| and external) maybe once a month (most recently, in the
| large scale change tool!). For context, I've been at Google
| for ~3 years. I've only had a changelist rejected once, and
| that was because the maintainer disagreed with the
| technical direction of the change and not the change
| itself.
| mumblemumble wrote:
| So I'm kind of stuck thinking, if you want to have cross-cutting
| changes happening in many small steps rather than one big one,
| why not choose a repository layout that makes it the path of
| least resistance, rather than one that makes it require extra
| effort?
| bluGill wrote:
| There is no perfect answer. Both mono-repo and multi-repo have
| pros and cons. Once you make a choice you have to deal with the
| cons that come with your choice (you are allowed to change, but
| then you get a different set of cons to deal with).
| Cthulhu_ wrote:
| The API changes are mainly a problem (I think) if you have a
| monorepo, but not a mono-deploy; you can only get the full
| benefits of a monorepo if all your changes are deployed
| simultaneously and atomically.
|
| Changing a model that is shared between different services (or a
| client / server) should be atomic. In practice, you need to think
| about backwards compatibility, and work in the three-step process
| outlined in the article (build and deprecate, switch over, remove
| deprecated code across a number of deployments / time).
|
| If you don't have atomic deployments, that's one less argument in
| favor of monorepos.
| jyounker wrote:
| You're using one specific definition of an API. Another
| definition of an API is the contract between a library and the
| calling code. The latter sort of API changes happen frequently
| (at least in places with monorepos), and they don't require
| phased deployments.
| [deleted]
| kerblang wrote:
| Sort of off-topic, but before building a shared library, I
| strongly recommend reading up on Jeff Atwood's "Rule of Threes".
| It's harder than most people think, and I've seen them do far
| more harm than good because authors aren't disciplined & lack
| domain expertise. The biggest mistake is thinking "Something is
| always better than nothing" when in fact that something can
| easily become your biggest liability. If you must do it, keep it
| as minimalist and focused as possible, and don't hand the task to
| junior-level people; hand it to someone who recognizes that it's
| 10x harder than it looks.
|
| https://blog.codinghorror.com/the-delusion-of-reuse
|
| https://blog.codinghorror.com/rule-of-three/
| marceloabsousa wrote:
| Mostly agree with the article. To me the problem is about
| dependency management. I see all the time codebases hugely
| fragmented at the level of git which is totally ad-hoc. After a
| while, teams face a lot of issues the most annoying being one
| change in the product involving N PRs with N code reviews and N
| CIs. This fragmentation of knowledge also pop-ups in weird
| integration bugs that could be solved with better tool enforcing
| clearer processes.
|
| The ultimate goal of Reviewpad (https://reviewpad.com) is to
| allow the benefits of the monorepo approach independently of how
| the codebase is fragmented at the git level. We are starting from
| the perspective of code review and debugging (e.g. code reviews
| with multiple PRs, or code reviews across multiple projects). For
| people doing microservices like us, the ability to have a single
| code review for both library and clients has been quite positive
| so far.
| nhoughto wrote:
| Don't agree with the conclusion or reasoning, sure you might need
| to take a multi Stage approach for many reasons. But if you've
| got your shit together atomic change is possible, in many other
| approaches it's not even nearly possible. Whether your org can
| put together the maturity / investment / whatever to make it a
| reality is upto you. But the possibility is the sell, not the
| guarantee. If you can make it work (and google do on a crazy
| scale) it's an incredible power, incredible, like change your
| entire approach to tech debt incredible. Also has costs at scale
| like the theory about how google keeps shutting down products
| because keeping them in tree is expensive.
| jyounker wrote:
| Google keeps shutting down projects because of internal
| politics and reward structures.
| shawnz wrote:
| Single "big bang" atomic commits that update both the client and
| server with new features usually aren't practical, I can agree
| with that.
|
| I think the real intention of the "atomic commits" idea is a
| total ordering of commits between both the client and server
| code. Both a "big bang" strategy as well as the author's
| incremental change strategy can benefit from that arrangement.
|
| The key is that at any given point in the repository's history,
| you can be sure that the client and server both work together.
| The point is not that each commit atomically completes an entire
| new feature, only that each commit atomically moves the
| repository to a state where the client and server still work
| together.
|
| In that sense, the author's incremental commits actually do have
| that kind of atomicity.
| joatmon-snoo wrote:
| ITT: people who have never dealt with things going _wrong_ in a
| large codebase.
| xcambar wrote:
| The process described in the article (allow old+new behavior on
| server, migrate clients, deprecate old behavior on server) works
| spectacularly well, especially when used with a decent level of
| observability and informafion in the form of logging messages.
|
| We are using variations on this theme extensively at my company,
| in a large spectrum of projects, with great satisfaction.
|
| Crucially, this method is orthogonal to using monorepos. It is
| simply a safety net for and good stewardship of your APIs.
|
| Definitely a best practice in my tool belt.
| kohlerm wrote:
| That is just a best practice for published interfaces (allowed
| to be used outside of your repositories). E.G. using a
| different build, main example being APIs available for
| customers) Not all interfaces need (really "must not") to be
| published. For "private" interfaces it is an advantage to be
| able to refactor them easily and also being able to rollback
| those changes easily if needed. Without a mono-repo that
| becomes difficult. You basically need an additional layer to
| "stitch" commits together. You could question why that makes
| sense because it would just emulate mono-repo behavior.
| bluGill wrote:
| If an interface spans repos it isn't private.
|
| I assume you are not creating some strawman every source code
| file is in a separate repository. That would be insane and
| nobody does that. If you are in a mutli-repo world, then how
| your break your repos up is an important decision. There are
| interfaces that are allowed to use within one repo that you
| cannot use in others, which allows things that should be
| coupled to be coupled, while forcing things that should be
| more isolated to be isolated. This is the power of the multi-
| repo: the separation of what is private vs public is
| enforced. (which isn't to say you should go to multi-repo -
| there are pros and cons of both approaches, in the area of
| where and interface can be used multi-repo gives you more
| control, but there are other ways to achieve the same goal)
| KaiserPro wrote:
| Monorepos offer one thing, and precisely one advantage,
| "visibility"
|
| because everything is in the same place, you can, in theory can
| find stuff.
|
| However in practice its also a great way to hide things.
|
| But the major issue is that people confuse monorepos for a
| release system. Monorepos do not replace the need for an artifact
| cache/store, or indeed versioned libraries. They also don't
| dictate that you can't have them, it just makes it easier _not
| to_.
|
| You can do what facebook do, which is essentially have an
| unknowable npm like dependency graph and just yolo it. They sorta
| make it work by having lots of unit tests. However that only
| works on systems that have a continuous update path (ie not user
| hardware.)
|
| It is possible to have "atomic changes" across many libraries. it
| makes it easier to regex stuff, but also impossible to test or
| predict what would happen. Its very rare that you'd want to alter
| >0.1% of the files in your monorepo at one time. But thats not
| the fault of a monorepo, thats a product of having millions of
| files.
| lhnz wrote:
| > It's particularly easy to see that the "atomic changes >
| across the whole repo" story is rubbish when you move >
| away from libraries, and also consider code that has > any
| kind of more complicated deployment lifecycle, > for
| example the interactions between services and > client
| binaries that communicate over an RPC interface.
|
| This seems exactly wrong to me. Getting rid of complicated
| deployment lifecycles is exactly the job that people use
| monorepos to solve. Wasn't that one of the reasons they are used
| at Google, Facebook, etc? As well as being able to do large-scale
| refactors in one place, of course.
|
| You should be able to merge a PR to cause a whole system to
| deploy: clients, backend services, database schemas, libraries,
| etc. This doesn't preclude wanting to break up commits into
| meaningful chunks or to add non-breaking-change migration
| patterns into libraries -- but consider this: is it meaningful
| for a change to be broken into separate commits just because it
| is being done to independent services? What benefit does cutting
| commits this way give you?
|
| What you want to avoid is needing to do separate PRs into many
| backend service and client repos, since: (1) when the review is
| split into 10+ places it's easier for reviewers to miss problems
| arising due to integration, (2) needing to avoid breaking changes
| can sometimes require developers to follow multi-stage upgrade
| processes that are so difficult that they cause mistakes, and (3)
| when there are separate PRs into different repositories these
| tend to start independent CI processes that will not test the
| system as-it-will-be (unless you test in production or have a
| very good E2E suite -- which would be a good idea in this
| situation).
|
| I will say that, even in a monorepo, a big change might still
| happen gradually behind feature flags. But I think that generally
| it's nice to be able to deploy breaking changes in a more atomic
| fashion.
| jmillikin wrote:
| I can only attest to how Google does (did) it, but they use the
| monorepo as a serialized consistent history. There is no
| concept of deploying "the whole system" -- even deploying a
| single service requires, mechanically, multiple commits spread
| across time.
|
| In fact, when I was last there in 2017, making a backwards-
| incompatible atomic change to multiple unrelated areas of the
| codebase was forbidden by policy and technical controls (the
| "components" system). You _had_ to chop that thing up, and wait
| a day or two for the lower-level parts of it to work their way
| through to HEAD.
|
| I would generalize this to say that the idea of deploying
| clients, schemas, backends, etc all at once is an inherently
| "small scale" approach.
| lhnz wrote:
| Interesting, RE: Google. Though, even if no other large-scale
| company is doing this, it seems on face value to be an easier
| and safer way to develop software up to perhaps a medium
| scale system/problem (and assuming that you're not needing to
| make database changes). I've yet to see a benefit to
| straddling changes across multiple repositories...
| andyferris wrote:
| The benefit of changes straddling repositories is having
| separation of control.
|
| For example different npm (cargo, etc) packages are
| controlled by different entities. Semver is used to
| (loosely) account for compatibility issues and allow for
| rolling updates.
|
| A single company requiring multiple repositories for
| control reasons might be an antipattern and might indicate
| issues with alignment/etc.
| lhnz wrote:
| > The benefit of changes straddling repositories >
| is having separation of control.
|
| Good point, although a monorepo with a `CODEOWNERS` file
| could be used to give control of different areas of a
| codebase to different people/teams.
| jyounker wrote:
| Monorepo tooling (as opposed to a big repo with a bunch
| of stuff tossed into it) generally provide access
| controls.
|
| Also remember that non DVCS repos generally have find-
| grained access controls.
| jsnell wrote:
| > This seems exactly wrong to me. Getting rid of complicated
| deployment lifecycles is exactly the job that people use
| monorepos to solve. Wasn't that one of the reasons they are
| used at Google, Facebook, etc? As well as being able to do
| large-scale refactors in one place, of course.
|
| No, neither of those is why big companies use monorepos.
| Clearly the kinds of things you wrote are why the general
| public _thinks_ big companies use monorepos, which is why this
| argument keeps popping up. But given making atomic changes to
| tens, hundreds, or thousands of projects does not actually
| match the normal workflows used by those companies, it cannot
| be the real reason.
|
| Monorepos are nice due to trunk based development, and a shared
| view of the current code base. Not due to the capability of
| making cross-cutting changes in one go.
| kohlerm wrote:
| You do not need Monorepos for trunk based development. In
| principle you "only" need a common build system. E.g. You can
| just force the consumption of head via the build system. I
| think the rollback is still a factor. E.g. in a Monorepo
| rolling back a change is straightforward, whereas with
| multiple repos you would have to somehow track that commits
| from several repos belong together. This would create an
| additional layer which is avoided by using a monorepo
| ec109685 wrote:
| Without a mono repo and the ability to build from HEAD all
| the components, it's much harder to be sure a change to a
| library _is_ actually backwards compatible (think a
| complicated refactoring).
|
| Otherwise, there is much more fear that a change to an
| important library will have downstream impact and when the
| impact does arise, you've moved on from the change that
| caused it.
| nicative wrote:
| Even within the same repo, it is very likely that the old
| version of your code will coexist with the new version during
| the deploy roll out. Often having different commits and deploys
| is a requirement. For instance, imagine that you add a column
| in the database and also is using it in the backend service.
| You probably have add the column first and then commit and
| deploy the usage of the column later, because you can't easily
| guarantee that the new code won't be used before the column is
| added. Same would apply for a new field in the API contract.
| magicalhippo wrote:
| We commit the column schema change and the code that uses it
| in a single commit.
|
| This is handled by our deployment tool. It won't allow the
| new executable to run until the database has been updated to
| the new schema.
| thundergolfer wrote:
| The old version won't coexist when the change is contained
| within a single binary, which seems like it would be true in
| a bunch of cases.
|
| In our monorepo we have to treat database changes with care,
| like you mention, as well as HTTP client/server API changes,
| but a bunch of stuff can be refactored cleanly without
| concern for backwards compatibility.
| rutthenut wrote:
| That is only likely to apply in very small-scale
| environments or companies.
|
| And if only a single binary is produced, quite likely a
| single source code repo would be used as well - sounds like
| 'single developer mode', well 'small team' at most.
| withinboredom wrote:
| Do you only have a single instance of the binary running
| across the whole org? And during deployment to you stop the
| running instance before starting the new one?
| jvolkman wrote:
| Any change that won't cross deployable binary boundaries
| (think docker container) can be made atomically without
| care about subsequent deployment schedules. So this
| doesn't work for DB changes or client/server API changes
| as mentioned by OP, but does work for changes to shared
| libraries that get packaged into the deployment
| artifacts. For example, changing the interface in an
| internal shared library, or updating the version of a
| shared external library.
| kohlerm wrote:
| Seems like a common misconception to me that people seem
| to believe that you can never change an interface. You
| actually can as long as it is not published to be used
| outside of your repositories.
| tsss wrote:
| It's simply unfeasible to do that. The best you can do is blue-
| green deployments with a monorepo but you will still need at
| least data backwards compatibility to be able to roll back any
| change. The only thing you gain with blue-green deployments is
| slightly easier API evolution.
| taeric wrote:
| Nit: you need backward compatibility to roll out the new
| change. You need forward compatibility to roll it back.
|
| Right?
| tsss wrote:
| Yes, but usually you need both.
| taeric wrote:
| Oh, agreed. My point was it was easy for folks to think
| they are safe because they have backwards compatibility,
| when they actually need both if they are concerned with
| running a rollback.
| kohlerm wrote:
| I do not really agree with the conclusion. Google does large
| scale automatic refactorings. Those clearly benefit from atomic
| commits, because it is easy to roll them back in that case. As
| other have mentioned in smaller projects you might want to be
| able to do some (internal) refactorings easily and being able to
| roll them back easily is a big advantage
| tylerhou wrote:
| As someone who has worked on large-scale refactorings at Google
| they usually do happen as the author describes:
|
| 1. Add the new interface (& deprecate the old one)
|
| 2. Migrate callers over (this can take a long time)
|
| 3. Remove the old interface.
|
| Even then, you risk breakages because in some cases the
| deprecation relies on creating a list of approved existing
| callers, and new callers might be checked in while you're
| generating that list. (In that case you would ask the new
| callers to fix-forward by adding themselves to the list.)
|
| This three step process has to happen because step 2 takes a
| long time for widely-used interfaces, and automatic
| refactorings cannot handle all cases.
|
| The only time you can consolidate all three into one commit is
| if the refactoring is so trivial that automatic tooling can
| handle every case (in which case, does the cost of code churn &
| code review really justify the change?) or the number of usages
| are small enough that a person can manually make all the
| changes the automatic fixer can't handle before much churn
| happens.
| vlovich123 wrote:
| Hmm... I didn't work on large scale refactors but (IIRC -
| it's been a few years) I definitely had to approve commits
| that were touching components I had ownership over & that was
| part of the same commit as many other changes batched. Given
| that I recall a wiki describing what your responsibility was
| as the downstream user of the API in these scenarios, it
| seemed like a common thing to me & I recall there were also
| automated tools to help do this.
|
| Whether or not that's the most common workflow is another
| story. Works great at the C++/Java API layer or trivial
| reorganizations, may not work as well when modifying runtime
| behavior since you have to do the 3 phase commit anyway.
| skybrian wrote:
| I'm not sure what you consider the conclusion to be, but large-
| scale refactorings at Google do happen in the way described in
| the article, with the bulk of the migration done using many
| commits done in parallel.
|
| Being able to rollback is indeed important and reducing the
| sizes of commits makes it less painful. If one project has a
| problem, unrelated projects won't see the churn.
|
| Or at least that's how it was done when I was there. It's been
| a while.
| joatmon-snoo wrote:
| It still is done that way, but even individual LSC (large
| scale change) CLs touch easily >100 files and rely very
| strongly on the atomicity guarantee of the monorepo.
|
| Plus, when most people think LSCs, they think about the kind
| of stuff that the C++ or TS or <insert lang> team do, not
| someone refactoring some library used by a handful of teams,
| which themselves usually impact anywhere between ten to a few
| hundred files.
| kohlerm wrote:
| I am saying large-scale refactoring can and should (in same
| cases) happen (and are doable). The author claims you should
| always do it in steps. I disagree with this opinion. If the
| refactoring is automatic, why would I do it in steps?
| bluGill wrote:
| Is your automatic refactoring completely 100% bug free in
| all edge conditions?
|
| I personally don't have that much confidence.
| skybrian wrote:
| The more projects you touch, the more approvals you need
| and the more tests you need to run. Some tests take a long
| time to run and some are flaky. And the longer to takes to
| do all this, the more likely there is to be a merge
| conflict due to other people's commits.
|
| If you divide it up, most commits will land quickly and
| without issue, and then you can deal with the rest.
| sam0x17 wrote:
| The article seems to be written from a perspective where
| integration and unit testing don't exist. IMO the main advantage
| of monorepos is that you can have end-to-end tests within the
| same project structure. In that case you wouldn't need to do
| these incremental changes hoping things don't break in
| production, because your test suite would prove to you that it
| works.
| ec109685 wrote:
| And even if you do the changes incrementally, it lets you
| validate you are doing each step correctly and not breaking
| backwards compatibility along the way.
| corpMaverick wrote:
| On the subject of libraries on large code bases. I think one
| should be careful on deciding what goes into a library and what
| goes into an API/Service. Both allow you to share code. But there
| is a difference on who controls the deployment in prod. For
| example if you need to change a business rule, you can change the
| code, and when the API/service is deployed the rule change takes
| effect. However in a library you make the change and deploy the
| library but it is up to the apps that include the library to
| decide when the change is deployed in prod. As the owner of the
| library, you no longer have control on when it is deployed in
| prod.
| majikandy wrote:
| Interesting idea and it certainly makes sense to ensure working
| software at all times.
|
| However it doesn't have to be one or the other.
|
| Sometimes a single commit (or PR if you prefer multiple commits)
| can update the api and _all_ clients at the same time.
|
| Sometimes client callers are external to the repo/company. In
| which case backward compatibility strategy is needed.
|
| There is no need to abandon a concept of a single repo just
| because you might not use one of the main benefits all the time.
| argonaut wrote:
| There are plenty of smaller changes that are just way easier with
| a single commit. A small API call with just two or three
| consumers, a documentation change, etc. The multistage approach
| is certainly good for big changes.
| stkdump wrote:
| Also, if you do all changes at once, you can also run the whole
| testsuite and find problems quicker. Sure, you could still
| later split the work up into many commits.
|
| But even when problems come up that you didn't see in the
| tests, it is easier to revert the work if it indeed is a single
| commit.
|
| The real reason why I still sometimes prefer many small commits
| is to reduce the chance of merge conflicts.
| kohlerm wrote:
| You can do automatic bisecting https://git-scm.com/docs/git-
| bisect which you cannot easily do with multiple repos
| cjfd wrote:
| The arguments seem exceedingly weak here. Having a monorepo does
| not prevent a workflow like (1) add new function signature; (2)
| let all calls use the new function signature; (3) remove old
| function signature. However, in some cases a multirepo makes sure
| you could only do it this way. If a function is called all over
| the place the three-step program is the only somewhat safe way to
| do it but if a function is called not exceedingly many times one
| can do the atomic refactor, which is an easier process. That one
| sometimes needs the three-step process and sometimes can do an
| atomic refactor is in no way contradictory as this article seems
| to claim. It depends on the size of the change and the risk
| involve what you would do in a particular case.
|
| Also, a monorepo forcing trunk-based development is great. Long-
| living feature branches are hell. I would even say to avoid
| feature branches entirely and use feature switches instead
| whenever one can get away with it. Every branch always runs the
| risk of making refactoring more difficult.
| Cthulhu_ wrote:
| Re: branches, I agree in principle - we've had feature branches
| at my current job that were open for a year, reasoning being
| that it impacts a lot of our application core and we cannot
| permit issues impacting existing customers, etc etc.
|
| But I do like short-lived (max 1 sprint) branches for my own
| smaller scale projects because they group changes together. I
| name my branches after an issue number, rebase freely on that
| branch, and merge with a `--no-ff` flag when I'm done. My
| history is a neat branch / work / merge construct.
|
| Not sure if this is just fear, but I believe trunk- and feature
| switch based history will end up an incoherent mess of multiple
| projects being worked on at the same time, commits passing
| through each other. I'm sure they can be filtered out, but
| still.
| kohlerm wrote:
| It is not the monorepo which forces trunkbased development it
| is the fact that you use one build for the monorepo. E.g. you
| can do trunk based development with multiple repos by forcing
| everyone to use a common build. You could also use a monorepo
| and still everyone does their own build consuming libraries via
| mechanisms such as Mavens Repositories or published npm
| packages
| thundergolfer wrote:
| > The example of renaming a function with thousands of callers,
| for example, is probably better handled by just temporarily
| aliasing the function, or by temporarily defining the new
| function in terms of the old.
|
| Why exactly? The best I can think of is that you may annoy people
| because they'd need to rebase/merge after your change lands and
| takes away a function they were using.
|
| If you're literally just doing a rename, and you're using
| something like Java, why not just go ahead and do a global
| rename?
| jsnell wrote:
| Because actually getting that commit pushed won't be just
| clicking "rename method". You'll need to run and pass the
| presubmits of every single project you made code changes in -
| and the more projects you're changing at once, the less likely
| it is that the build is green for all of them at the same time.
| Then you'll need to get code review approvals from the code
| owners of each of the clients in one go. Hopefully no new users
| pop up during this process. If some do, you'll need to iterate
| again.
|
| Then once some trivial refactoring inevitably causes some kind
| of breakage ("oh, somebody was using reflection on this
| class"), you'll need to revert the change. That'll be another
| set of full code reviews from every single owner. Let's hope
| that in the meanwhile, nobody pushed any commits that depend on
| your new code.
|
| None of this is a problem if you have a library and two
| clients. But the story being told is not "we can safely make
| changes in three projects at once", it's "we can safely make
| changes in hundreds or thousands of projects". The former is
| kind of uninteresting. The latter is a fairy tale.
| jvolkman wrote:
| The latter does happen at Google at least. But it requires
| tooling to reliably make the change (monorepo-aware
| refactoring tools, basically), and tooling to run all
| affected tests. Such large changes at Google are often
| accompanied by a doc outlining the specific process and
| possible impacts of the change, and many times are given
| "global approval" by someone that has that ability rather
| than requiring individual approvals from each affected team.
| jsnell wrote:
| That's not true. The process for the large scale changes at
| Google is described in the "Sofware Engineering at Google"
| book[0]. Chapter 22 is all about it. There is tooling, yes,
| but the goal is exactly the opposite of trying to make a
| single commit across the whole codebase:
|
| > At Google, we've long ago abandoned the idea of making
| sweeping changes across our codebase in these types of
| large atomic changes.
|
| [0] https://abseil.io/resources/swe_at_google.2.pdf
| squiggleblaz wrote:
| The article is talking about mitigating risk. Unless I missed
| something, it doesn't restrict itself to Java, so that's an
| unreasonable restriction when questioning the author's
| reasoning. But I don't think it destroys the argument
| completely.
|
| I suppose a single-developer code base in a fully checked and
| compiled language that doesn't support any kind of reflection
| has no particular added risk from renaming vs introducing a new
| name. Each time you remove one of those constraints, you add a
| little bit of risk.
|
| If you have a giant company, it might be possible that someone
| is copying a jar file and calling it in a weird way that you
| don't expect.
|
| If your language isn't fully checked, you might correctly
| rename all the Typescript uses and miss a Javascript use.
|
| If the language supports dynamical calling, it might turn out
| that somewhere it says "if the value of the string is one of
| these string values, call the method whose name is equal to the
| value of the string". There's various IPC systems that work
| this way, and it will certainly be hard to atomically upgrade
| them. I hate that kind of code but someone else doesn't.
|
| If your language supports you doing these things, you can
| create as many conventions as you like to eliminate it. But
| someone will have an emergency and they need to fix it right
| now.
|
| Some people view the correct way of dealing with that problem
| is to insist on the development conventions, because we need to
| have some kind of conventions for a large team to feasibly work
| together.
|
| But I guess the author leans towards the side that says "if
| it's valid according to the language/coding environment, it
| might be better or worse, but it's still valid and we need to
| expect and accommodate it". It isn't my preference but it's a
| viable position - technical debt is just value if you can
| accommodate it without some unreasonable burden.
| magicalhippo wrote:
| > If you're literally just doing a rename, and you're using
| something like Java, why not just go ahead and do a global
| rename?
|
| It introduces changes to places which really doesn't need
| changes. We've done both at work, but I mostly prefer just
| making the old function(s) simply call the new one directly.
|
| Then you won't "pollute" source control annotation (blame) and
| similar.
|
| Not a 100% thing though.
| bob1029 wrote:
| It looks like more effective change control is a major concern of
| the author.
|
| We use a monorepo for our organization and have found that
| feature flags are the best way to manage the problem of exposing
| new functionality in ways that won't piss off our customers. We
| can let them tell us when they are ready and we flip the switch.
|
| Once a flag goes true for every customer, make a note to drop it.
| This is important because these things will accumulate if you
| embrace this ideology.
| draw_down wrote:
| This is still easier to do with monorepo. One reason is easily,
| _reliably_ finding all the usage sites.
| w_t_payne wrote:
| Either way, you've got to do the tooling work to make your chosen
| approach work. There's no free lunch here.
___________________________________________________________________
(page generated 2021-07-21 23:01 UTC)