[HN Gopher] Monorepo - Our Experience
___________________________________________________________________
Monorepo - Our Experience
Author : vishnumohandas
Score : 102 points
Date : 2024-11-06 13:37 UTC (9 hours ago)
(HTM) web link (ente.io)
(TXT) w3m dump (ente.io)
| siva7 wrote:
| Ok, but the more interesting part - how did you solve the CI/CD
| part and how does it compare to a multirepo?
| CharlieDigital wrote:
| Most CI/CD platforms will allow specification of targeted
| triggers.
|
| For example, in GitHub[0]: name: ".NET - PR
| Unit Test" on: ## Only execute these
| unit tests when a file in this directory changes.
| pull_request: branches: [main] paths:
| [src/services/publishing/**.cs, src/tests/unit/**.cs]
|
| So we set up different workflows that kick off based on the
| sets of files that change.
|
| [0] https://docs.github.com/en/actions/writing-
| workflows/workflo...
| victorNicollet wrote:
| I'm not familiar with GitHub Actions, but we reverted our
| migration to Bitbucket Pipelines because of a nasty side-
| effect of conditional execution: if a commit triggers test
| suite T1 but not T2, and T1 is successful, Bitbucket displays
| that commit with a green "everything is fine" check mark,
| regardless of the status of T2 on any ancestors of that
| commit.
|
| That is, the green check mark means "the changes in this
| commit did not break anything that was not already broken",
| as opposed to the more useful "the repository, as of this
| commit, passes all tests".
| ants_everywhere wrote:
| isn't that generally what you want? the check mark tells
| you the commit didn't break anything. if something was
| already broken it should have either blocked the commit
| that broke it or there's a flake somewhere that you can
| only locate by periodically running tests independent of
| any PR activity.
| daelon wrote:
| Is it a side effect if it's also the primary effect?
| plorkyeran wrote:
| I would find it extremely confusing and unhelpful if tests
| in the parent commit which weren't rerun for a PR because
| nothing relevant was touched marked the PR as red. Why
| would you even want that? That's not something which is
| relevant to evaluating the PR and would make you get in the
| habit of ignoring failures.
|
| If you split something into multiple repositories then
| surely you wouldn't mark PRs on one of them as red just
| because tests are failing in a different one?
| victorNicollet wrote:
| I suppose our development process is a bit unusual.
|
| The meaning we give to "the commit is green" is not "this
| PR can be merged" but "this can be deployed to
| production", and it is used for the purpose of selecting
| a release candidate several times a week. It is a
| statement about the entire state of the project as of
| that commit, rather than just the changes introduced in
| that commit.
|
| I can understand the frustration of creating a PR from a
| red commit on the main branch, and having that PR be red
| as well as a result. I can't say this has happened very
| often, though: red commits on the main branch are very
| rare, and new branches tend to be started right after a
| deployment, so it's overwhelmingly likely that the PR
| will be rooted at a green commit. When it does happen,
| the time it takes to push a fix (or a revert) to the main
| branch is usually much shorter than the time for a review
| of the PR, which means it is possible to rebase the PR on
| top of a green commit as part of the normal PR acceptance
| timeline.
| plorkyeran wrote:
| Going off the PR status to determine if the end result is
| deployable is not reliable. A non-FF merge can have both
| the base commit and the PR be green but the merged result
| fail. You need to run your full test suite on the merged
| result at some point before deployment; either via a
| commit queue or post-merge testing.
| victorNicollet wrote:
| I agree ! We use the commit status instead of the PR
| status. A non-FF merge commit, being a commit, would have
| its own status separate from the status of its parents.
| hk1337 wrote:
| Even AWS CodeBuild (or CodePipeline) allows you to do this
| now. It didn't before but it's a fairly recent update.
| CharlieDigital wrote:
| As a prior user of AWS Code*, I can appreciate that you
| qualified that with "Even" LMAO
| devjab wrote:
| I don't think CI/CD should really be a big worry as far as
| mono-repositories go as you can setup different pipelines and
| different flows with different configurations. Something you're
| probably already doing if you have multiple repos.
|
| In my experience the article is right when it tells you there
| isn't that big of a difference. We have all sorts of
| repositories, some of which are basically mono-repositories for
| their business domain. We tend to separate where it "makes
| sense" which for us means that it's when what we put into
| repositories is completely separate from everything else. We
| used to have a lot of micro-repositories and it wasn't that
| different to be honest. We grouped more of them together to
| make it easier for us to be DORA compliant in terms of the
| bureaucracy it adds to your documentation burden. Technically I
| hardly notice.
| JamesSwift wrote:
| In my limited-but-not-nothing experience working with mono vs
| multi repo of the same projects, CI/CD definitely was one of
| the harder pieces to solve. Its highly dependent on your
| frameworks and CI provider on just how straightforward it is
| going to be, and most of them are "not very straightforward".
|
| The basic way most work is to run full CI on every change.
| This quickly becomes a huge speedbump to deployment velocity
| until a solution for "only run what is affected" is found.
| bluGill wrote:
| The problem with "only run what is affected" is it is
| really easy to have something that is affected but doesn't
| seem like it should be (that is whatever tools you have to
| detect is it affected say it isn't). So if you have such a
| system you must have regular rebuild everything jobs as
| well to verify you didn't break something unexpected.
|
| I'm not against only run what is affected, it is a good
| answer. It just has failings that you need to be aware of.
| JamesSwift wrote:
| Yeah thats a good point. Especially for an overly-dynamic
| runtime like ruby/rails, theres just not usually a clean
| way to cordon off sections of code. On the other hand,
| using nx in an angular project was pretty amazing.
| bluGill wrote:
| Even in something like C++ you often have configuration,
| startup scripts (I'm in embedded, maybe this isn't a
| think elsewhere), database schemas, and other such things
| that the code depends on but it isn't obvious to the
| build system that the dependency exists.
| devjab wrote:
| Which CI/CD pipelines have you had issues with? Because
| that isn't my experience at all. With both GitHub (also
| Azure DevOps) and gitlab you can separate your pipelines
| with configurations like .gitlab-ci.yml. I guess it can be
| non-trivial to setup proper parallelisation when you have a
| lot of build stages if this isn't something you're familiar
| with. A lot of other more self-hosted tools like Gradle,
| RushJS and many others you can setup configurations which
| does X if Y and make sure only to run things which are
| necessary.
|
| I don't want to be rude, but a lot of these tools have
| rather accessible documentation on how to get up and
| running as well as extensive documentation for more complex
| challenges available in their official docs. Which is
| probably the, only, place you'll find good ways of working
| with it because a lot of the search engine and LLM
| "solutions" will range from horrible to outdated.
|
| It can be both slower and faster than micro-repositories in
| my experience, however, you're right that it can indeed be
| a Cthulhu level speed bump if you do it wrong.
| JamesSwift wrote:
| I implied but didnt explicitly mention that I'm talking
| from the context of moving _from_ existing polyrepo _to_
| monorepo. The tooling is out there to walk a more happy-
| path experience if you jump in on day 1 (or early in the
| product lifecycle). But its much harder to migrate to it
| and not have to redo a bunch of CI-related tooling.
| victorNicollet wrote:
| Wouldn't CI be easier with a monorepo ? Testing integration
| across multiple repositories (triggered by changes in any of
| them) seems more complex than just adding another test suite to
| a single repo.
| bluGill wrote:
| Pros and cons. Both can be used successfully, but there are
| different problems to each. If you have a large project you
| will have a tool teams to deal with the problems of your
| solution.
| xyzzy_plugh wrote:
| Without indicating my personal feelings on monorepo vs polyrepo,
| or expressing any thoughts about the experience shared here, I
| would like to point out that open-source projects have different
| and sometimes conflicting needs compared to proprietary closed-
| source projects. The best solution for one is sometimes the
| extreme opposite for the other.
|
| In particular many build pipelines involving private sources or
| artifacts become drastically more complicated than their those of
| publicly available counterparts.
| bunderbunder wrote:
| I've also seen this with branching strategies. IMO the best
| branching strategy for open source projects is generally the
| worst one for commercial projects, and vice versa.
| magicalhippo wrote:
| We're transitioning from a SVN monorepo to Git. We've considered
| doing a kind of best-of-both-worlds approach.
|
| Some core stuff into separate libraries, consumed as nuget
| packages by other projects. Those libraries and other standalone
| projects in separate repos.
|
| Then a "monorepo" for our main product, where individual projects
| for integrations etc will reference non-nuget libraries directly.
|
| That is, tightly coupled code goes into the monorepo, the rest in
| separate repos.
|
| Haven't taken the plunge just yet tho, so not sure how well it'll
| actually work out.
| dezgeg wrote:
| In my experience this turns to nightmare when (not if, when)
| there is need to make changes to the libraries and app at the
| same time. Especially with libraries it's often necessary to
| create a client for an API at the same time to really know that
| the interface is any good.
| magicalhippo wrote:
| The idea is that the libraries we put in nuget are really
| non-project-specific. We'll use nuget to manage library
| versions rather than git submodules, so hopefully they can
| live fine in a separate repo.
|
| So updating them at the same time shouldn't be a huge deal,
| we just make the change in the library, publish the nuget
| package, and then bump the version number in the downstream
| projects that need the change.
|
| Ideally changes to these libraries should be relatively
| limited.
|
| For things that are intertwined, like an API client alongside
| the API provider and more project-specific libraries, we'll
| keep those together in the same repo.
|
| If this is what you're thinking of, I'd be interested in
| hearing more about your negative experiences with such a
| setup.
| CharlieDigital wrote:
| > Moving to a monorepo didn't change much, and what minor changes
| it made have been positive.
|
| I'm not sure that this statement in the summary jives with this
| statement from the next section: > In the
| previous, separate repository world, this would've been four
| separate pull requests in four separate repositories, and with
| comments linking them together for posterity. >
| > Now, it is a single one. Easy to review, easy to merge, easy to
| revert.
|
| IMO, this is a huge quality of life improvement and prevents a
| lot of mistakes from not having the right revision synced down
| across different repos. This alone is a HUGE improvement where a
| dev doesn't accidentally end up with one repo in this branch and
| forgot to pull this other repo at the same branch and get weird
| issues due to this basic hassle.
|
| When I've encountered this, we've had to use _another repo_ to
| keep scripts that managed this. But this was also sometimes
| problematic because each developer 's setup had to be identical
| on their local file system (for the script to work) or we had to
| each create a config file pointing to where each repo lived.
|
| This also impacts tracking down bugs and regression analysis;
| this is much easier to manage in a mono-repo setup because you
| can get everything at the same revision instead of managing
| synchronization of multiple repos to figure out where something
| broke.
| notwhereyouare wrote:
| ironically was gonna come and comment on that same second block
| of text.
|
| We went from monorepo to multi-repo at work and it's been a
| huge set back and disappointment with the devs because it's
| what our contractors recommended.
|
| I've asked for a code deploy and everything and it's failed in
| prod due to a missing check in
| CharlieDigital wrote:
| > ...because it's what our contractors recommended
|
| It's sad when this happens instead of taking input from the
| team on how to actually improve productivity/quality.
|
| A startup I joined started with a multi-repo because the
| senior team came from a FAANG where this was common practice
| to have multiple services and a repo for each service.
|
| Problem was that it was a startup with one team of 6 devs and
| each of the pieces was connected by REST APIs. So now any
| change to one service required deploying that service and
| pulling down the OpenAPI spec to regenerate client bindings.
| It was so clumsy and easy to make simple mistakes.
|
| I refactored the whole thing in one weekend into a monorepo ,
| collapsed the handful of services into one service, and we
| never looked back.
|
| That refactoring and a later paper out of Google actually
| inspired me to write this article as a practical guide to
| building a _" modular monolith"_:
| https://chrlschn.dev/blog/2024/01/a-practical-guide-to-
| modul...
| eddd-ddde wrote:
| At least google and meta are heavy into monorepos, I'm
| really curious what company is using a _repo per service_.
| That's insane.
| dewey wrote:
| It's almost never a good idea to get inspired by what
| Google / Meta / Huge Company is doing as most of the
| times you don't have their problems and they have custom
| toolings and teams making everything work on that scale.
| CharlieDigital wrote:
| In this case, I'd say it's the opposite: monorepo as an
| approach works amazingly well for small teams all the
| ways up to huge orgs (with the right tooling to support
| it).
|
| The difference is that past a certain level of
| complexity, the org will most certainly need specialized
| tooling to support massive codebases to make CI/CD
| (build, test, deploy, etc.) times sane.
|
| On the other hand, multi-repos may work for massive orgs,
| but is always going to add friction for small orgs.
| dewey wrote:
| In this case I wasn't even referring to mono repo or not,
| but more about the idea of taking inspiration from very
| large companies for your own not-large-company problems.
| influx wrote:
| I've used one of the Meta monorepos (yeah there's not
| just one!) and it's super painful at that scale.
| aleksiy123 wrote:
| I feel like this has been repeated so much now that
| peoples takeaway is that you shouldn't adopt anything
| from large companies as a small company by default. And
| thats simply not true.
|
| The point here is to understand what are the problems
| that are being solved, understand if they are similar to
| yours, and make a decision based on wether the tradeoffs
| are a good fit for you.
|
| Not necessarily disagreeing with you but I just feel the
| pendulum on this statement has swung to far to the other
| side now.
| pc86 wrote:
| It can make sense when you have a huge team of devs and
| different teams responsible for everything where you may
| be on multiple teams, and nobody is exactly responsible
| for all the same set of services you are. Depending on
| the security/access provisioning culture of the org,
| "taking half a day to manually grant access to the repos
| so-and-so needs access to" may actually be an easier sell
| than "give everyone access to all our code."
|
| If you just have 20-30 devs and everyone is pretty silo'd
| (e.g. frontend or backend, data or API, etc) having 75
| repos for your stuff is just silly.
| bobnamob wrote:
| Amazon uses "repo per service" and it is semi insane, but
| Brazil (the big ol' internal build system) and Coral (the
| internal service framework) make it "workable".
|
| As someone who worked in the dev tooling org, getting
| teams to keep their deps up to date was a nightmare.
| bluGill wrote:
| Monorepo and multi repo both have their own need for
| teams to work on dev tooling when the project gets large.
| jgtrosh wrote:
| My team implemented (and reimplemented!) a project using
| one repo per module. I think the main benefit was
| ensuring enough separation of concern due to the burden
| of changing multiple parts together. I managed to reduce
| something like 10 repos down to 3... Work in progress.
| tpm wrote:
| > burden of changing multiple parts together
|
| Then you are adapting your project to the properties of
| code repository. I don't see that as a benefit.
| wrs wrote:
| I worked at a Fortune 1 company that used one repo _per
| release_ for a certain major software component.
| biorach wrote:
| was that as insane as it sounds?
| seadan83 wrote:
| Did that work out well at all? Any silver lining? My
| first thought is: "branches" & "tags" - wow... Would
| branches/tags have just been easier to work with?
|
| Were they working with multiple services in a multi-repo?
| Seems like a cross-product explosion of repos. Did that
| configuration inhibit releases, or was the process
| cumbersome but just smooth because it was so rote?
| wrs wrote:
| It was a venerable on-prem application done in classic
| three-tier architecture (VB.NET client, app server, and
| database). It was deployed on a regular basis to
| thousands of locations (one deploy per location) and was
| critical to a business with 11-digit revenue.
|
| So yeah, cumbersome, but established, and huge downside
| risk to messing with the status quo. It was basically Git
| applied on top of an existing "copy the source" release
| process.
| psoundy wrote:
| Have you heard of OpenShift 4? Self-hosted Kubernetes by
| Red Hat. Every little piece of the control plane is its
| own 'operator' (basically a microservice) and every
| operator is developed in its own repo.
|
| A github search for 'operator' in the openshift org has
| 178 results:
|
| https://github.com/orgs/openshift/repositories?language=&
| q=o...
|
| Not all are repos hosting one or more microservices, but
| most appear to be. Best of luck ensuring consistency and
| quality across so many repos.
| adra wrote:
| It's just as easy? When you have a monorepo with 5
| million lines of code, you're only going to focus on the
| part of the code you care about and forget the rest. Same
| with 50 repos of 100,000 loc.
|
| Enforcing standards means actually having org level
| mandates around acceptable development standards, and
| it's enforced using tools. Those tools should be just as
| easily run on one monorepo than 50+ distributed
| repositories, nay?
| psoundy wrote:
| Even in the best case of what you are describing, how are
| these tools configured and their configuration maintained
| except via PRs to the repos in question? For every such
| change, N PRs having to be proposed, reviewed and merged.
| And all this without considering the common need (in a
| healthy project at least) to make cross-cutting changes
| with similar friction around landing a change across
| repos.
|
| If you wanted to, sure, applying enough time and money
| could make it work. I like to think that those resources
| might be better spent, though.
| stackskipton wrote:
| >So now any change to one service required deploying that
| service and pulling down the OpenAPI spec to regenerate
| client bindings. It was so clumsy and easy to make simple
| mistakes.
|
| Why? Is your framework heavily tied to client bindings?
| APIs I consume occasionally get new fields added to it for
| data I don't need. My code just ignores it. We also have a
| policy you cannot add a new mandatory field to API without
| version bump. So maybe REST API would have new field but I
| didn't send it and it happily didn't care.
| jayd16 wrote:
| If prod went down because of a missing check in, there are
| other problems.
| notwhereyouare wrote:
| did I say prod went down? I just said it failed in prod. it
| was a logging change and only half the logging went out. To
| me, that's a failure
| taeric wrote:
| My only counter argument here, is when those 4 things deploy
| independently. Sometimes, people will get tricked into thinking
| a code change is atomic because it is in one commit, when it
| will lead to a mixed fleet because of deployment realities. In
| that world, having them separate is easier to work with, as you
| may have to revert one of the deployments separately from the
| others.
| derefr wrote:
| That's just an argument for not doing "implicit GitOps",
| treating the tip of your monorepo's main branch as the
| source-of-truth on the correct deployment state of your
| entire system. ("Implicit GitOps" sorta-kinda works when you
| have a 1:1 correspondence between repos and deployable
| components -- though not always! -- but it isn't tenable for
| a monorepo.)
|
| What instead, then? Explicit GitOps. Explicit, reified
| release specifications (think k8s resource manifests, or
| Erlang .relup files), one per separately-deploy-cadenced
| component. If you have a monorepo, then these live _also_ as
| a dir in the monorepo. CD happens only when these files
| change.
|
| With this approach, a single PR _can_ atomically merge code
| _and_ update one or more release specifications (triggering
| CD for those components), _if and when_ that is a sensible
| thing to do. But there can also be separate PRs for updating
| the code vs. "integrating and deploying changes" to a
| component, if-and-when _that_ is sensible.
| scubbo wrote:
| ...I can't believe I'd never thought about the fact that a
| "Deployment Repo" can, in fact, just be a directory within
| the Code Repo. Interesting thought - thanks!
| taeric wrote:
| I mean... sure? Yes, if you add extra structure on top of
| your code that is there to model the deployments, then you
| get a bit closer to modeling your deployments. Isn't that
| the exact argument for why you might want multiple
| repositories, as well?
| ramchip wrote:
| [delayed]
| lmz wrote:
| Isn't a mixed fleet always the case once you have more than
| one server and do rolling updates?
| taeric wrote:
| Yes. And if you structure your code to explicitly do this,
| it is a lot easier to reason about.
| audunw wrote:
| There's nothing preventing you from having a single pull
| request in for merging branches over multiple repos. There's
| nothing preventing you from having a parent repo with a lock
| file that gives you a single linear set of commits tracking the
| state of multiple repos.
|
| That is, if you're not tied to using just Github of course.
|
| Big monorepos and multiple repo solutions require some tooling
| to deal with scaling issues.
|
| What surprises me is the attitude that monorepos are the right
| solution to these challenges. For some projects it makes sense
| yes, but it's clear to me that we should have a solution that
| allows repositories to be composed/combined in elegant ways.
| Multi-repository pull requests should be a first class feature
| of any serious source code management system. If you start two
| projects separately and then later find out you need to combine
| their history and work with them as if they were one
| repository, you shouldn't be forced to restructure the
| repositories.
| pelletier wrote:
| > Multi-repository pull requests should be a first class
| feature of any serious source code management system.
|
| Do you have examples of source code management systems that
| provide this feature and do you have experience with them?
| repo-centric approach of GitHub often feels limiting.
| jvolkman wrote:
| Apparently Gerrit supports this with topics:
| https://gerrit-
| review.googlesource.com/Documentation/cross-r...
| CharlieDigital wrote:
| > Multi-repository pull requests should be a first class
| feature of any serious source code management system.
|
| But it's currently not? > If you start two
| projects separately and then later find out you need to
| combine their history and work with them as if they were one
| repository, you shouldn't be forced to restructure the
| repositories.
|
| It's called a directory copy. Cut + paste. I'd add a tag with
| a comment pointing to the old repo (if needed). But probably
| after a few weeks, no one is going to look at the old repo.
| dmazzoni wrote:
| > It's called a directory copy. Cut + paste. I'd add a tag
| with a comment pointing to the old repo (if needed). But
| probably after a few weeks, no one is going to look at the
| old repo.
|
| Not in my experience. I use "git blame" all the time, and
| routinely read through commits from many years ago in order
| to understand why a particular method works the way it
| does.
|
| Luckily, there are many tools for merging git repos into
| each other while preserving history. It's not as simple as
| copy and paste, but it's worth the extra efford.
| danudey wrote:
| I prefer microservices/microrepos _conceptually_, but we had
| the same experience as your quoted text - making changes to
| four repos, and backporting those changes to the previous two
| release branches, means twelve separate PRs to make a change.
|
| Having a centralized configuration library (a shared Makefile
| that we can pull down into our repo and include into the local
| Makefile) helps, until you have to make a backwards-
| incompatible change to that Makefile and then post PRs to every
| branch of every repo that uses that Makefile.
|
| Now we have almost the entirety of our projects back into one
| repository and everything is simpler; one PR per release
| branch, three PRs (typically) for any change that needs
| backporting. Vastly simpler process and much less room for
| error.
| wongarsu wrote:
| It's not _as much_ of a pain if your tooling supports git repos
| as dependencies. For example a typical multi-repo PR for us
| with rust is 1) PR against library 2) PR against application
| that points dependency to PR 's branch, makes changes 3) PR
| review 4) PR 1 is approved and merged 5) PR 2 is changed to
| point to new master branch of commit 6) PR 2 is approved and
| merged
|
| Same idea if you use some kind of versioning and release
| system. It's still a bit of a pain with all the PRs and
| coordination involved, but at every step every branch is
| consistent and buildable, you just check it out and hit build.
|
| This is obviously more difficult if you have a more loosely
| coupled architecture like microservices. But that's self-
| inflicted pain
| ericyd wrote:
| I felt the same, the author seemed to downplay the success
| while every effect listed in the article felt like a huge
| improvement.
| eikenberry wrote:
| I thought one of the whole points behind separate (non-
| mono)repos was to help enforce loose coupling and if you came
| to a point where a single feature change required PRs on 4
| separate repos then that was an indicator that your project
| needed refactoring as it was becoming to tightly coupled. The
| example in the article could have been interpreted to mean that
| they should refactor the functionality for interacting with the
| ML model into it's own repo so it could encapsulate this aspect
| of the project. Instead they doubled down on the tighter
| coupling by putting them in a monorepo (which itself encourages
| tighter coupling).
| Attummm wrote:
| The issue you faced stemmed from the previous best practice of
| "everything in its own repository." This approach caused major
| issues. Such as versioning challenges and data model
| inconsistencies you mentioned. The situations it could lead to
| are comedy sketches, but it's a real pain especially when
| you're part of a team struggling with these problems. And it's
| almost impossible to convince a team to change direction once
| they've committed to it.
|
| Now, though, it seems the pendulum has swung in the opposite
| direction, from "everything in its own repo" to "everything in
| one repo." This, too, will create its own set of problems,
| which also can be comedic, but frustrating to experience. For
| instance, what happens when someone accidentally pushes a
| certificate or API key and you need to force an update
| upstream? Coordinating that with 50 developers spread across 8
| projects, all in a single repo.
|
| Instead we could also face the problems we currently face and
| start out wirn a balanced approach. Start with one repository,
| or split frontend and backend if needed. For data pipelines
| that share models with the API, keep them in the same
| repository, creating a single source of truth for the data
| model. This method has often led to other developers telling me
| about the supposed benefits of "everything in its own repo."
| Just as I pushed back then, I feel the need to push back now
| against the monorepo trend.
|
| The same can be said for monoliths and microservices, where the
| middle ground is often overlooked in discussions about best
| practices.
|
| They all reminded me of the concept of "no silver bullet"[0].
| Any decision will face its own unique challenges. But silver
| bullet solution can create artificial challenges that are
| wasteful, painful, and most of all unnecessary.
|
| [0]https://en.m.wikipedia.org/wiki/No_Silver_Bullet
| memsom wrote:
| monorepos are appropriate for a single project with many sub
| parts but one or two artifacts on any given release build. But
| they fall apart when you have multiple products in the monorepo,
| each with different release schedules.
|
| As soon as you add a second separate product that uses a
| different subset of any code in the repo, you should consider
| breaking up the monorepo. If the code is "a bunch of libraries"
| and "one or more end user products" it becomes even more
| imperative to consider breaking down stuff..
|
| Having worked on monorepos where there are 30+ artifacts,
| multiple ongoing projects that each pull the monorepo in to
| different incompatible versions, and all of which have their own
| lifetime and their own release cycle - monorepo is the antithesis
| of a good idea.
| munksbeer wrote:
| No offense but I think you're doing monorepos wrong. We have
| more than 100 applications living in our monorepo. They share
| common core code, some common signals, common utility libs, and
| all of them share the same build.
|
| We release everything weekly, and some things much more
| frequently.
|
| If your testing is good enough, I don't see what the issue is?
| bluGill wrote:
| > If your testing is good enough, I don't see what the issue
| is?
|
| Your testing isn't good enough. I don't know who you are,
| what you are working on, or how much testing you do, but I
| will state with confidence it isn't good enough.
|
| It might be acceptable for your current needs, but you will
| have bugs that escape testing - often intentional as you
| can't stop forever to fix all known bugs. In turn that means
| if anything changes in your current needs you will run into
| issues.
|
| > We release everything weekly, and some things much more
| frequently.
|
| This is a negative to users. When you think you will release
| again next so who cares about bugs it means your users see
| more bugs. Sure it is nice that you don't have to break open
| years old code anymore, but if the new stuff doesn't have
| anything the user wants is this really a good thing?
| memsom wrote:
| No offence, but you might be a little confused by how complex
| your actual delivery is. That sounds simple. That sounds like
| it has a clear roadmap. When you don't, and you have very
| agile development that pivots quickly and demands a lot of
| change concurrently for releases that have very different
| goals, it is not possible to make all your ducks sit in a
| row. Monorepos suck in that situation. The dependency graph
| is so complex it will make your head hurt. And all the
| streams need to converge in to the main dev branch at some
| point, which causes huge bottlenecks.
| tomtheelder wrote:
| The dependency graph is no different for a monorepo vs a
| polyrepo. It's just a question of how those dependencies
| get resolved.
| vander_elst wrote:
| Working on a monorepo where we have hundreds (possibly
| thousands) of projects each with a different version and
| release schedule. It actually works quite well, the
| dependencies are always in a good state, it's easy to see the
| ramifications of a change and to reuse common components.
| memsom wrote:
| Good for you. For us, because we have multiple projects going
| on, pulling the code in different ways, code that runs on
| embedded, code that runs in the cloud, desktop apps (real
| ones written in C++ and .Net, not glorified web apps), code
| that is customer facing, code used by third parties for
| integrating our products, no - it just doesn't work. The
| embedded shares a core with other levels, and we support
| multiple embedded platforms (bare metal) and OS (Windows,
| Linux, Android, iOS) and also have stuff that runs in
| Amazon/Azure cloud platform. You might be fine, but when you
| hit critical mass and you have very complicated commercial
| concerns, it doesn't work well.
| tomtheelder wrote:
| I mean it works for Google. Not saying that's a reason to
| go monorepo, but it at least suggests that it can work for
| a very large org with very diverse software.
|
| I really don't see why anything you describe would be an
| issue at all for a monorepo.
| h1fra wrote:
| I think the big issue around monorepo is when a company puts
| completely different projects together inside a single repo.
|
| In this article almost everything makes sense to me (because
| that's what I have been doing most of my career) but they put
| their OTP app inside which suddenly makes no sense. And you can
| see the problem in the CI they have dedicated files just for this
| App and probably very few common code with the rest.
|
| IMO you should have one monorepo per project (api, frontend,
| backend, mobile, etc. as long as it's the same project) and if
| needed a dedicated repo for a shared library.
| fragmede wrote:
| > you should have one monorepo per project (api, frontend,
| backend, mobile, etc. as long as it's the same project)
|
| _that 's not a monorepo!_
|
| Unless the singular "project" is stuff our company ships, the
| problem you have is of impedance mismatch between the projects,
| which is the problem that an _actual_ monorepo solves. for swe
| 's on individual projects who will never have the problem of
| having to ship a commit on all the repos at the "same" time,
| yeah that seems fine, and for them it is. the problem comes as
| a distributed systems engineer where, for whatever reason, many
| or all the repos need to be shipped at the ~same time. or worse
| - A needs to ship before B which needs ship before C but that
| needs to ship before A, and you have to unwind that before
| actually being able to ship the change.
| hk1337 wrote:
| > that's not a monorepo!
|
| Sure it is! It's just not the ideal use case for a monorepo
| which is why people say they don't like monorepos.
| vander_elst wrote:
| "one monorepo per project (api, frontend, backend, mobile,
| etc. as long as it's the same project) and if needed a
| dedicated repo for a shared library."
|
| They are literally saying that multiple repos should be
| used, also for sharing the code, this is not monorepo,
| these are different repos.
| gregmac wrote:
| To me, monorepo vs multi-repo is not about the code organization,
| but about the deployment strategy. My rule is that there should
| be a 1:1 relation between a repository and a release/deployment.
|
| If you do one big monolithic deploy, one big monorepo is ideal.
| (Also, to be clear, this is separate from microservice vs
| monolithic app: your monolithic deploy can be made up of as many
| different applications/services/lambdas/databases as makes
| sense). You don't have to worry about cross-compatibility between
| parts of your code, because there's never a state where you can
| deploy something incompatible, because it all deploys at once. A
| single PR makes all the changes in one shot.
|
| The other rule I have is that if you want to have individual
| repos with individual deployments, they must be both forward- and
| backwards-compatible for long enough that you never need to do a
| coordinated deploy (deploying two at once, where everything is
| broken in between). If you have to do coordinated deploys, you
| really have a monolith that's just masquerading as something more
| sophisticated, and you've given up the biggest benefits of _both_
| models (simplicity of mono, independence of multi).
|
| Consider what happens with a monorepo with parts of it being
| deployed individually. You can't checkout any specific commit and
| mirror what's in production. You could make multiple copies of
| the repo, checkout a different commit on each one, then try to
| keep in mind which part of which commit is where -- but this is
| utterly confusing. If you have 5 deployments, you now have 4
| copies of any given line of code on your system that are
| potentially wrong. It becomes very hard to not accidentally break
| compatibility.
|
| TL;DR: Figure out your deployment strategy, then make your
| repository structure mirror that.
| aswerty wrote:
| This mirrors my own experience in the SaaS world. Anytime
| things move towards multiple artifacts/pipelines in one repo;
| trying to understand what change existed where and when seems
| to always become very difficult.
|
| Of course the multirepo approach means you do this dance a lot
| more: - Create a change with backwards compatibility and
| tombstones (e.g. logs for when backward compatibility is used)
| - Update upstream systems to the new change - Remove backwards
| compatibility and pray you don't have a low frequency upstream
| service interaction you didn't know about
|
| While the dance can be a pain - it does follow a more iterative
| approach with reduced blast radiuses (albeit many more of
| them). But, all in all, an acceptable tradeoff.
|
| Maybe if I had more familiarity in mature tooling around
| monorepos I might be more interested in them. But alas not a
| bridge I have crossed, or am pushed to do so just at the
| moment.
| CharlieDigital wrote:
| It doesn't have to be that way.
|
| You can have a mono-repo and deploy different parts of the repo
| as different services.
|
| You can have a mono-repo with a React SPA and a backend service
| in Go. If you fix some UI bug with a button in the React SPA,
| why would you also deploy the backend?
| oneplane wrote:
| You wouldn't, but making a repo collection into a mono-repo
| means your mono-deploy needs to be split into a multi-maybe-
| deploy.
|
| As always, complexity merely moves around when squeezed, and
| making commits/PRs easier means something else, somewhere
| else gets less easy.
|
| It is something that can be made better of course, having
| your CI and CD be a bit smarter and more modular means you
| can now do selective builds based on what was actually
| changed, and selective releases based on what you actually
| want to release (not merely what was in the repo at a commit,
| or whatever was built).
|
| But all of that needs to be constructed too, just merging
| some repos into one doesn't do that.
| CharlieDigital wrote:
| This is not very complex at all.
|
| I linked an example below. Most CI/CD, like GitHub
| Actions[0], can easily be configured to trigger on changes
| for files in a specific path.
|
| As a very basic starting point, you only need to set up
| simple rules to detect which monorepo roots changed.
|
| [0] https://docs.github.com/en/actions/writing-
| workflows/workflo...
| bryanlarsen wrote:
| If you don't deploy in tandem, you need to test forwards &
| backwards compatibility. That's tough with either a monorepo
| or separate repos, but arguably it'd be simple with separate
| repos.
| CharlieDigital wrote:
| It doesn't have to be that complicated.
|
| All you need to know is "does changing this code affect
| that code".
|
| In the example I've given -- a React SPA and Go backend --
| let's assume that there's a gRPC binding originating from
| the backend. How do we know that we also need to deploy the
| SPA? Updating the schema would cause generation of a new
| client + model in the SPA. Now you know that you need to
| deploy both and this can be done simply by detecting roots
| for modified files.
|
| You can scale this. If that gRPC change affected some other
| web extension project, apply the same basic principle:
| detect that a file changed under this root -> trigger the
| workflow that rebuilds, tests, and deploys from this root.
| Falimonda wrote:
| This is spot on. A monorepo can still include a granular and
| standardized CI configuration across code paths. Nothing
| about monorepo forces you to perform a singular deployment.
|
| The gains provided by moving from polyrepo to monorepo are
| immense.
|
| Developer access control is the only thing I can think to
| justify polyrepo.
|
| I'm curious if and how others who see the advantages of
| monorepo have justified polyrepo in spite of that.
| syndicatedjelly wrote:
| Some thoughts:
|
| 1) Comparing a photo storage app to the Linux kernel doesn't make
| much sense. Just because a much bigger project in an entirely
| different (and more complex) domain uses monorepos, doesn't mean
| you should too.
|
| 2) What the hell is a monorepo? I feel dumb for asking the
| question, and I feel like I missed the boat on understanding it,
| because no one defines it anymore. Yet I feel like every mention
| of monorepo is highly dependent on the context the word is used
| in. Does it just mean a single version-controlled repository of
| code?
|
| 3) Can these issues with sync'ing repos be solved with better use
| of `git submodule`? It seems to be designed exactly for this
| purpose. The author says "submodules are irritating" a couple
| times, but doesn't explain what exactly is wrong with them. They
| seem like a great solution to me, but I also only recently
| started using them in a side project
| datadrivenangel wrote:
| Monorepo is just a single repo. Yup.
|
| Git submodules have some places where you can surprisingly lose
| branches/stashed changes.
| syndicatedjelly wrote:
| One of my repos has a dependency on another repo (that I also
| own). I initialized it as a git submodule (e.g. my_org/repo1
| has a submodule of my_org/repo2). Git
| submodules have some places where you can surprisingly lose
| branches/stashed changes.
|
| This concerns me, as git generally behaves as a leak-proof
| abstraction in my experience. Can you elaborate or share
| where I can learn more about this issue?
| klooney wrote:
| > Does it just mean a single version-controlled repository of
| code?
|
| Yeah- they idea is that all of your projects share a common
| repo. This has advantages and drawbacks. Google is most famous
| for this approach, although I think they technically have three
| now- one for Google, one for Android, and one for Chrome.
|
| > They seem like a great solution to me
|
| They don't work in a team context because they're extra steps
| that people don't do, basically. And did some reason a lot of
| people find them confusing.
| nonameiguess wrote:
| https://github.com/google/ contains 2700+ repositories. I
| don't know necessarily how many of these are read-only clones
| from an internal monorepo versus how many are separate
| projects that have actually been open-sourced, but the latter
| is more than zero.
| mgaunard wrote:
| Doing modular right is harder than doing monolithic right.
|
| But if you do it right, the advantage you get is that you get to
| pick which versions of your dependencies you use; while quite
| often you just want to use the latest, being able to pin is also
| very useful.
| lukewink wrote:
| You can still publish packages and pull them down as (pinned)
| dependencies all within a monorepo.
| mgaunard wrote:
| that's a terrible and arguably broken-by-design workflow
| which entirely defeats the point of the monorepo, which is to
| have a unified build of everything together, rather than
| building things piecemeal in ways that could be incompatible.
|
| For C++ in particular, you need to express your dependencies
| in terms of source versions, and ensure all of the build
| artifacts you link together were built against the same
| source version of every transitive dependency and with the
| same flags. Failure to do that results in undefined
| behaviour, and indeed I have seen large organizations with
| unreliable builds as a manner of routine because of that.
|
| The best way to achieve that is to just build the whole thing
| from source, with a content-addressable-store shared with the
| whole organization to transparently avoid building redundant
| things. Whether your source is in a single repo or spread
| over several doesn't matter so long as your tooling manages
| that for you and knows where to get things, but ultimately
| the right way to do modular is simply to synthesize the
| equivalent monorepo and build that. Sometimes there is the
| requirement that specific sources should have restricted
| access, which is often a reason why people avoid building
| from source, but that's easy to work around by building on
| remote agents.
|
| Now for some reason there is no good open-source build system
| for C++, while Rust mostly got it right on the first try.
| Maybe it's because there are some C++ users still attached to
| the notion of manually managing ABI.
| stackskipton wrote:
| As DevOps/SRE type person that occasionally gets stuck with
| builds, Monorepos world well if company will invest in the build
| process. However, many companies don't do well in this area and
| Monorepo blast radius becomes much bigger so individual repos it
| is. Also, depending on the language, building private repo is
| easy enough to keep all common libraries in.
| stillbourne wrote:
| I like to use the monorepo tools without the monorepo repo. If
| that makes any god damn sense. I use NX at my job and the
| monorepo was getting out of hand, 6 hour pipeline builds, 2 hours
| testing, etc. So I broke the repo into smaller pieces. This
| wouldn't have been possible if I wasn't already using the
| monorepo tools universally through the project but it ended up
| working well.
| KaiserPro wrote:
| Monorepos have their advantages, as pointed out, one place to
| review, one place to merge.
|
| But it can also breed instability, as you can upgrade other
| people's stuff without them being aware.
|
| There are ways around this, which involve having a local module
| store, and building with named versions. Very similar to a bunch
| of disparate repos, but without getting lost in github (github's
| discoverability was always far inferior to gitlab)
|
| However it has its draw backs namely that people can hold out on
| older versions than you want to support.
| dkarl wrote:
| > But it can also breed instability, as you can upgrade other
| people's stuff without them being aware
|
| This is why Google embraced the principle that if somebody
| breaks your code without breaking your tests, it's your fault
| for not writing better tests. (This is sometimes known as the
| Beyonce rule: if you liked it, you should have put a test on
| it.)
|
| You need the ability to upgrade dependencies in a hands-off way
| even if you don't have a monorepo, though, because you need to
| be able to apply security updates without scheduling dev work
| every time. You shouldn't need a careful informed eye to tell
| if upgrades broke your code. You should be able to trust your
| tests.
| msoad wrote:
| I love monorepos but I'm not sure if Git is the right tool beyond
| certain scale. Where I work doing a simple `git status` takes
| seconds due to the size of the repo. There has been various
| attempts to solve Git performance but so far this is nothing
| close to what I experienced at Google.
|
| The Git team should really invest in tooling for very large
| repos. Our repo is around 10M files and 100M lines of code and no
| amount of hacks on top of Git (cache, sparse checkout etc etc) is
| not really solving the core problem.
|
| Meta and Google have really solved this problem internally but
| there is no real open source solution that works for everyone out
| there.
| dijit wrote:
| I'm secretly hoping that google releases piper (and Mondrian);
| the gaming industry would go wild.
|
| Perforce is pretty brutal, and the code review tools are awful
| - but its still the undisputed king of mixed text and binary
| assets in a huge monorepo.
| paxys wrote:
| All the pitfalls of a monorepo can disappear with some good
| tooling and regular maintenance, so much so that devs may not
| even realize that they are using one. The actual meat of the
| discussion is - should you deploy the entire monorepo as one unit
| or as multiple (micro)services?
| bobim wrote:
| Started to use a monorepo + worktrees to keep related but
| separated developments all together with different checkouts.
| Anybody else on the same path?
| __MatrixMan__ wrote:
| Every monorepo I've ever met (n=3) has some kind of radioactive
| DMZ that everybody is afraid to touch because it's not clear who
| owns it but it is clear from its quality that you don't want to
| be the last person who touched it because then maybe somebody
| will think that you own it. It's usually called "core" or
| somesuch.
|
| Separate repos for each team means that when two teams own
| components that need to interact, they have to expose a "public"
| interface to the other team--which is the kind of disciplined
| engineering work that we should be striving for. The monorepo-
| alternative is that you solve it in the DMZ where it feels less
| like engineering and more like some kind of multiparty political
| endeavor where PR reviewers of dubious stakeholder status are
| using the exercise to further agendas which are unrelated to the
| feature except that it somehow proves them right about whatever
| architectural point is recently contentious.
|
| Plus, it's always harder to remove something from the DMZ than to
| add it, so it's always growing and there's this sort of
| gravitational attractor which, eventually starts warping time
| such that PR's take longer to merge the closer they are to it.
|
| Better to just do the "hard" work of maintaining versioned
| interfaces with documented compatibility defined by tests. You
| can always decide to collapse your codebase into a black hole
| later--but if you start on that path you may never escape.
___________________________________________________________________
(page generated 2024-11-06 23:00 UTC)