[HN Gopher] Monorepo - Our Experience
       ___________________________________________________________________
        
       Monorepo - Our Experience
        
       Author : vishnumohandas
       Score  : 102 points
       Date   : 2024-11-06 13:37 UTC (9 hours ago)
        
 (HTM) web link (ente.io)
 (TXT) w3m dump (ente.io)
        
       | siva7 wrote:
       | Ok, but the more interesting part - how did you solve the CI/CD
       | part and how does it compare to a multirepo?
        
         | CharlieDigital wrote:
         | Most CI/CD platforms will allow specification of targeted
         | triggers.
         | 
         | For example, in GitHub[0]:                   name: ".NET - PR
         | Unit Test"                  on:           ## Only execute these
         | unit tests when a file in this directory changes.
         | pull_request:             branches: [main]             paths:
         | [src/services/publishing/**.cs, src/tests/unit/**.cs]
         | 
         | So we set up different workflows that kick off based on the
         | sets of files that change.
         | 
         | [0] https://docs.github.com/en/actions/writing-
         | workflows/workflo...
        
           | victorNicollet wrote:
           | I'm not familiar with GitHub Actions, but we reverted our
           | migration to Bitbucket Pipelines because of a nasty side-
           | effect of conditional execution: if a commit triggers test
           | suite T1 but not T2, and T1 is successful, Bitbucket displays
           | that commit with a green "everything is fine" check mark,
           | regardless of the status of T2 on any ancestors of that
           | commit.
           | 
           | That is, the green check mark means "the changes in this
           | commit did not break anything that was not already broken",
           | as opposed to the more useful "the repository, as of this
           | commit, passes all tests".
        
             | ants_everywhere wrote:
             | isn't that generally what you want? the check mark tells
             | you the commit didn't break anything. if something was
             | already broken it should have either blocked the commit
             | that broke it or there's a flake somewhere that you can
             | only locate by periodically running tests independent of
             | any PR activity.
        
             | daelon wrote:
             | Is it a side effect if it's also the primary effect?
        
             | plorkyeran wrote:
             | I would find it extremely confusing and unhelpful if tests
             | in the parent commit which weren't rerun for a PR because
             | nothing relevant was touched marked the PR as red. Why
             | would you even want that? That's not something which is
             | relevant to evaluating the PR and would make you get in the
             | habit of ignoring failures.
             | 
             | If you split something into multiple repositories then
             | surely you wouldn't mark PRs on one of them as red just
             | because tests are failing in a different one?
        
               | victorNicollet wrote:
               | I suppose our development process is a bit unusual.
               | 
               | The meaning we give to "the commit is green" is not "this
               | PR can be merged" but "this can be deployed to
               | production", and it is used for the purpose of selecting
               | a release candidate several times a week. It is a
               | statement about the entire state of the project as of
               | that commit, rather than just the changes introduced in
               | that commit.
               | 
               | I can understand the frustration of creating a PR from a
               | red commit on the main branch, and having that PR be red
               | as well as a result. I can't say this has happened very
               | often, though: red commits on the main branch are very
               | rare, and new branches tend to be started right after a
               | deployment, so it's overwhelmingly likely that the PR
               | will be rooted at a green commit. When it does happen,
               | the time it takes to push a fix (or a revert) to the main
               | branch is usually much shorter than the time for a review
               | of the PR, which means it is possible to rebase the PR on
               | top of a green commit as part of the normal PR acceptance
               | timeline.
        
               | plorkyeran wrote:
               | Going off the PR status to determine if the end result is
               | deployable is not reliable. A non-FF merge can have both
               | the base commit and the PR be green but the merged result
               | fail. You need to run your full test suite on the merged
               | result at some point before deployment; either via a
               | commit queue or post-merge testing.
        
               | victorNicollet wrote:
               | I agree ! We use the commit status instead of the PR
               | status. A non-FF merge commit, being a commit, would have
               | its own status separate from the status of its parents.
        
           | hk1337 wrote:
           | Even AWS CodeBuild (or CodePipeline) allows you to do this
           | now. It didn't before but it's a fairly recent update.
        
             | CharlieDigital wrote:
             | As a prior user of AWS Code*, I can appreciate that you
             | qualified that with "Even" LMAO
        
         | devjab wrote:
         | I don't think CI/CD should really be a big worry as far as
         | mono-repositories go as you can setup different pipelines and
         | different flows with different configurations. Something you're
         | probably already doing if you have multiple repos.
         | 
         | In my experience the article is right when it tells you there
         | isn't that big of a difference. We have all sorts of
         | repositories, some of which are basically mono-repositories for
         | their business domain. We tend to separate where it "makes
         | sense" which for us means that it's when what we put into
         | repositories is completely separate from everything else. We
         | used to have a lot of micro-repositories and it wasn't that
         | different to be honest. We grouped more of them together to
         | make it easier for us to be DORA compliant in terms of the
         | bureaucracy it adds to your documentation burden. Technically I
         | hardly notice.
        
           | JamesSwift wrote:
           | In my limited-but-not-nothing experience working with mono vs
           | multi repo of the same projects, CI/CD definitely was one of
           | the harder pieces to solve. Its highly dependent on your
           | frameworks and CI provider on just how straightforward it is
           | going to be, and most of them are "not very straightforward".
           | 
           | The basic way most work is to run full CI on every change.
           | This quickly becomes a huge speedbump to deployment velocity
           | until a solution for "only run what is affected" is found.
        
             | bluGill wrote:
             | The problem with "only run what is affected" is it is
             | really easy to have something that is affected but doesn't
             | seem like it should be (that is whatever tools you have to
             | detect is it affected say it isn't). So if you have such a
             | system you must have regular rebuild everything jobs as
             | well to verify you didn't break something unexpected.
             | 
             | I'm not against only run what is affected, it is a good
             | answer. It just has failings that you need to be aware of.
        
               | JamesSwift wrote:
               | Yeah thats a good point. Especially for an overly-dynamic
               | runtime like ruby/rails, theres just not usually a clean
               | way to cordon off sections of code. On the other hand,
               | using nx in an angular project was pretty amazing.
        
               | bluGill wrote:
               | Even in something like C++ you often have configuration,
               | startup scripts (I'm in embedded, maybe this isn't a
               | think elsewhere), database schemas, and other such things
               | that the code depends on but it isn't obvious to the
               | build system that the dependency exists.
        
             | devjab wrote:
             | Which CI/CD pipelines have you had issues with? Because
             | that isn't my experience at all. With both GitHub (also
             | Azure DevOps) and gitlab you can separate your pipelines
             | with configurations like .gitlab-ci.yml. I guess it can be
             | non-trivial to setup proper parallelisation when you have a
             | lot of build stages if this isn't something you're familiar
             | with. A lot of other more self-hosted tools like Gradle,
             | RushJS and many others you can setup configurations which
             | does X if Y and make sure only to run things which are
             | necessary.
             | 
             | I don't want to be rude, but a lot of these tools have
             | rather accessible documentation on how to get up and
             | running as well as extensive documentation for more complex
             | challenges available in their official docs. Which is
             | probably the, only, place you'll find good ways of working
             | with it because a lot of the search engine and LLM
             | "solutions" will range from horrible to outdated.
             | 
             | It can be both slower and faster than micro-repositories in
             | my experience, however, you're right that it can indeed be
             | a Cthulhu level speed bump if you do it wrong.
        
               | JamesSwift wrote:
               | I implied but didnt explicitly mention that I'm talking
               | from the context of moving _from_ existing polyrepo _to_
               | monorepo. The tooling is out there to walk a more happy-
               | path experience if you jump in on day 1 (or early in the
               | product lifecycle). But its much harder to migrate to it
               | and not have to redo a bunch of CI-related tooling.
        
         | victorNicollet wrote:
         | Wouldn't CI be easier with a monorepo ? Testing integration
         | across multiple repositories (triggered by changes in any of
         | them) seems more complex than just adding another test suite to
         | a single repo.
        
           | bluGill wrote:
           | Pros and cons. Both can be used successfully, but there are
           | different problems to each. If you have a large project you
           | will have a tool teams to deal with the problems of your
           | solution.
        
       | xyzzy_plugh wrote:
       | Without indicating my personal feelings on monorepo vs polyrepo,
       | or expressing any thoughts about the experience shared here, I
       | would like to point out that open-source projects have different
       | and sometimes conflicting needs compared to proprietary closed-
       | source projects. The best solution for one is sometimes the
       | extreme opposite for the other.
       | 
       | In particular many build pipelines involving private sources or
       | artifacts become drastically more complicated than their those of
       | publicly available counterparts.
        
         | bunderbunder wrote:
         | I've also seen this with branching strategies. IMO the best
         | branching strategy for open source projects is generally the
         | worst one for commercial projects, and vice versa.
        
       | magicalhippo wrote:
       | We're transitioning from a SVN monorepo to Git. We've considered
       | doing a kind of best-of-both-worlds approach.
       | 
       | Some core stuff into separate libraries, consumed as nuget
       | packages by other projects. Those libraries and other standalone
       | projects in separate repos.
       | 
       | Then a "monorepo" for our main product, where individual projects
       | for integrations etc will reference non-nuget libraries directly.
       | 
       | That is, tightly coupled code goes into the monorepo, the rest in
       | separate repos.
       | 
       | Haven't taken the plunge just yet tho, so not sure how well it'll
       | actually work out.
        
         | dezgeg wrote:
         | In my experience this turns to nightmare when (not if, when)
         | there is need to make changes to the libraries and app at the
         | same time. Especially with libraries it's often necessary to
         | create a client for an API at the same time to really know that
         | the interface is any good.
        
           | magicalhippo wrote:
           | The idea is that the libraries we put in nuget are really
           | non-project-specific. We'll use nuget to manage library
           | versions rather than git submodules, so hopefully they can
           | live fine in a separate repo.
           | 
           | So updating them at the same time shouldn't be a huge deal,
           | we just make the change in the library, publish the nuget
           | package, and then bump the version number in the downstream
           | projects that need the change.
           | 
           | Ideally changes to these libraries should be relatively
           | limited.
           | 
           | For things that are intertwined, like an API client alongside
           | the API provider and more project-specific libraries, we'll
           | keep those together in the same repo.
           | 
           | If this is what you're thinking of, I'd be interested in
           | hearing more about your negative experiences with such a
           | setup.
        
       | CharlieDigital wrote:
       | > Moving to a monorepo didn't change much, and what minor changes
       | it made have been positive.
       | 
       | I'm not sure that this statement in the summary jives with this
       | statement from the next section:                   > In the
       | previous, separate repository world, this would've been four
       | separate pull requests in four separate repositories, and with
       | comments linking them together for posterity.         >
       | > Now, it is a single one. Easy to review, easy to merge, easy to
       | revert.
       | 
       | IMO, this is a huge quality of life improvement and prevents a
       | lot of mistakes from not having the right revision synced down
       | across different repos. This alone is a HUGE improvement where a
       | dev doesn't accidentally end up with one repo in this branch and
       | forgot to pull this other repo at the same branch and get weird
       | issues due to this basic hassle.
       | 
       | When I've encountered this, we've had to use _another repo_ to
       | keep scripts that managed this. But this was also sometimes
       | problematic because each developer 's setup had to be identical
       | on their local file system (for the script to work) or we had to
       | each create a config file pointing to where each repo lived.
       | 
       | This also impacts tracking down bugs and regression analysis;
       | this is much easier to manage in a mono-repo setup because you
       | can get everything at the same revision instead of managing
       | synchronization of multiple repos to figure out where something
       | broke.
        
         | notwhereyouare wrote:
         | ironically was gonna come and comment on that same second block
         | of text.
         | 
         | We went from monorepo to multi-repo at work and it's been a
         | huge set back and disappointment with the devs because it's
         | what our contractors recommended.
         | 
         | I've asked for a code deploy and everything and it's failed in
         | prod due to a missing check in
        
           | CharlieDigital wrote:
           | > ...because it's what our contractors recommended
           | 
           | It's sad when this happens instead of taking input from the
           | team on how to actually improve productivity/quality.
           | 
           | A startup I joined started with a multi-repo because the
           | senior team came from a FAANG where this was common practice
           | to have multiple services and a repo for each service.
           | 
           | Problem was that it was a startup with one team of 6 devs and
           | each of the pieces was connected by REST APIs. So now any
           | change to one service required deploying that service and
           | pulling down the OpenAPI spec to regenerate client bindings.
           | It was so clumsy and easy to make simple mistakes.
           | 
           | I refactored the whole thing in one weekend into a monorepo ,
           | collapsed the handful of services into one service, and we
           | never looked back.
           | 
           | That refactoring and a later paper out of Google actually
           | inspired me to write this article as a practical guide to
           | building a _" modular monolith"_:
           | https://chrlschn.dev/blog/2024/01/a-practical-guide-to-
           | modul...
        
             | eddd-ddde wrote:
             | At least google and meta are heavy into monorepos, I'm
             | really curious what company is using a _repo per service_.
             | That's insane.
        
               | dewey wrote:
               | It's almost never a good idea to get inspired by what
               | Google / Meta / Huge Company is doing as most of the
               | times you don't have their problems and they have custom
               | toolings and teams making everything work on that scale.
        
               | CharlieDigital wrote:
               | In this case, I'd say it's the opposite: monorepo as an
               | approach works amazingly well for small teams all the
               | ways up to huge orgs (with the right tooling to support
               | it).
               | 
               | The difference is that past a certain level of
               | complexity, the org will most certainly need specialized
               | tooling to support massive codebases to make CI/CD
               | (build, test, deploy, etc.) times sane.
               | 
               | On the other hand, multi-repos may work for massive orgs,
               | but is always going to add friction for small orgs.
        
               | dewey wrote:
               | In this case I wasn't even referring to mono repo or not,
               | but more about the idea of taking inspiration from very
               | large companies for your own not-large-company problems.
        
               | influx wrote:
               | I've used one of the Meta monorepos (yeah there's not
               | just one!) and it's super painful at that scale.
        
               | aleksiy123 wrote:
               | I feel like this has been repeated so much now that
               | peoples takeaway is that you shouldn't adopt anything
               | from large companies as a small company by default. And
               | thats simply not true.
               | 
               | The point here is to understand what are the problems
               | that are being solved, understand if they are similar to
               | yours, and make a decision based on wether the tradeoffs
               | are a good fit for you.
               | 
               | Not necessarily disagreeing with you but I just feel the
               | pendulum on this statement has swung to far to the other
               | side now.
        
               | pc86 wrote:
               | It can make sense when you have a huge team of devs and
               | different teams responsible for everything where you may
               | be on multiple teams, and nobody is exactly responsible
               | for all the same set of services you are. Depending on
               | the security/access provisioning culture of the org,
               | "taking half a day to manually grant access to the repos
               | so-and-so needs access to" may actually be an easier sell
               | than "give everyone access to all our code."
               | 
               | If you just have 20-30 devs and everyone is pretty silo'd
               | (e.g. frontend or backend, data or API, etc) having 75
               | repos for your stuff is just silly.
        
               | bobnamob wrote:
               | Amazon uses "repo per service" and it is semi insane, but
               | Brazil (the big ol' internal build system) and Coral (the
               | internal service framework) make it "workable".
               | 
               | As someone who worked in the dev tooling org, getting
               | teams to keep their deps up to date was a nightmare.
        
               | bluGill wrote:
               | Monorepo and multi repo both have their own need for
               | teams to work on dev tooling when the project gets large.
        
               | jgtrosh wrote:
               | My team implemented (and reimplemented!) a project using
               | one repo per module. I think the main benefit was
               | ensuring enough separation of concern due to the burden
               | of changing multiple parts together. I managed to reduce
               | something like 10 repos down to 3... Work in progress.
        
               | tpm wrote:
               | > burden of changing multiple parts together
               | 
               | Then you are adapting your project to the properties of
               | code repository. I don't see that as a benefit.
        
               | wrs wrote:
               | I worked at a Fortune 1 company that used one repo _per
               | release_ for a certain major software component.
        
               | biorach wrote:
               | was that as insane as it sounds?
        
               | seadan83 wrote:
               | Did that work out well at all? Any silver lining? My
               | first thought is: "branches" & "tags" - wow... Would
               | branches/tags have just been easier to work with?
               | 
               | Were they working with multiple services in a multi-repo?
               | Seems like a cross-product explosion of repos. Did that
               | configuration inhibit releases, or was the process
               | cumbersome but just smooth because it was so rote?
        
               | wrs wrote:
               | It was a venerable on-prem application done in classic
               | three-tier architecture (VB.NET client, app server, and
               | database). It was deployed on a regular basis to
               | thousands of locations (one deploy per location) and was
               | critical to a business with 11-digit revenue.
               | 
               | So yeah, cumbersome, but established, and huge downside
               | risk to messing with the status quo. It was basically Git
               | applied on top of an existing "copy the source" release
               | process.
        
               | psoundy wrote:
               | Have you heard of OpenShift 4? Self-hosted Kubernetes by
               | Red Hat. Every little piece of the control plane is its
               | own 'operator' (basically a microservice) and every
               | operator is developed in its own repo.
               | 
               | A github search for 'operator' in the openshift org has
               | 178 results:
               | 
               | https://github.com/orgs/openshift/repositories?language=&
               | q=o...
               | 
               | Not all are repos hosting one or more microservices, but
               | most appear to be. Best of luck ensuring consistency and
               | quality across so many repos.
        
               | adra wrote:
               | It's just as easy? When you have a monorepo with 5
               | million lines of code, you're only going to focus on the
               | part of the code you care about and forget the rest. Same
               | with 50 repos of 100,000 loc.
               | 
               | Enforcing standards means actually having org level
               | mandates around acceptable development standards, and
               | it's enforced using tools. Those tools should be just as
               | easily run on one monorepo than 50+ distributed
               | repositories, nay?
        
               | psoundy wrote:
               | Even in the best case of what you are describing, how are
               | these tools configured and their configuration maintained
               | except via PRs to the repos in question? For every such
               | change, N PRs having to be proposed, reviewed and merged.
               | And all this without considering the common need (in a
               | healthy project at least) to make cross-cutting changes
               | with similar friction around landing a change across
               | repos.
               | 
               | If you wanted to, sure, applying enough time and money
               | could make it work. I like to think that those resources
               | might be better spent, though.
        
             | stackskipton wrote:
             | >So now any change to one service required deploying that
             | service and pulling down the OpenAPI spec to regenerate
             | client bindings. It was so clumsy and easy to make simple
             | mistakes.
             | 
             | Why? Is your framework heavily tied to client bindings?
             | APIs I consume occasionally get new fields added to it for
             | data I don't need. My code just ignores it. We also have a
             | policy you cannot add a new mandatory field to API without
             | version bump. So maybe REST API would have new field but I
             | didn't send it and it happily didn't care.
        
           | jayd16 wrote:
           | If prod went down because of a missing check in, there are
           | other problems.
        
             | notwhereyouare wrote:
             | did I say prod went down? I just said it failed in prod. it
             | was a logging change and only half the logging went out. To
             | me, that's a failure
        
         | taeric wrote:
         | My only counter argument here, is when those 4 things deploy
         | independently. Sometimes, people will get tricked into thinking
         | a code change is atomic because it is in one commit, when it
         | will lead to a mixed fleet because of deployment realities. In
         | that world, having them separate is easier to work with, as you
         | may have to revert one of the deployments separately from the
         | others.
        
           | derefr wrote:
           | That's just an argument for not doing "implicit GitOps",
           | treating the tip of your monorepo's main branch as the
           | source-of-truth on the correct deployment state of your
           | entire system. ("Implicit GitOps" sorta-kinda works when you
           | have a 1:1 correspondence between repos and deployable
           | components -- though not always! -- but it isn't tenable for
           | a monorepo.)
           | 
           | What instead, then? Explicit GitOps. Explicit, reified
           | release specifications (think k8s resource manifests, or
           | Erlang .relup files), one per separately-deploy-cadenced
           | component. If you have a monorepo, then these live _also_ as
           | a dir in the monorepo. CD happens only when these files
           | change.
           | 
           | With this approach, a single PR _can_ atomically merge code
           | _and_ update one or more release specifications (triggering
           | CD for those components), _if and when_ that is a sensible
           | thing to do. But there can also be separate PRs for updating
           | the code vs.  "integrating and deploying changes" to a
           | component, if-and-when _that_ is sensible.
        
             | scubbo wrote:
             | ...I can't believe I'd never thought about the fact that a
             | "Deployment Repo" can, in fact, just be a directory within
             | the Code Repo. Interesting thought - thanks!
        
             | taeric wrote:
             | I mean... sure? Yes, if you add extra structure on top of
             | your code that is there to model the deployments, then you
             | get a bit closer to modeling your deployments. Isn't that
             | the exact argument for why you might want multiple
             | repositories, as well?
        
             | ramchip wrote:
             | [delayed]
        
           | lmz wrote:
           | Isn't a mixed fleet always the case once you have more than
           | one server and do rolling updates?
        
             | taeric wrote:
             | Yes. And if you structure your code to explicitly do this,
             | it is a lot easier to reason about.
        
         | audunw wrote:
         | There's nothing preventing you from having a single pull
         | request in for merging branches over multiple repos. There's
         | nothing preventing you from having a parent repo with a lock
         | file that gives you a single linear set of commits tracking the
         | state of multiple repos.
         | 
         | That is, if you're not tied to using just Github of course.
         | 
         | Big monorepos and multiple repo solutions require some tooling
         | to deal with scaling issues.
         | 
         | What surprises me is the attitude that monorepos are the right
         | solution to these challenges. For some projects it makes sense
         | yes, but it's clear to me that we should have a solution that
         | allows repositories to be composed/combined in elegant ways.
         | Multi-repository pull requests should be a first class feature
         | of any serious source code management system. If you start two
         | projects separately and then later find out you need to combine
         | their history and work with them as if they were one
         | repository, you shouldn't be forced to restructure the
         | repositories.
        
           | pelletier wrote:
           | > Multi-repository pull requests should be a first class
           | feature of any serious source code management system.
           | 
           | Do you have examples of source code management systems that
           | provide this feature and do you have experience with them?
           | repo-centric approach of GitHub often feels limiting.
        
             | jvolkman wrote:
             | Apparently Gerrit supports this with topics:
             | https://gerrit-
             | review.googlesource.com/Documentation/cross-r...
        
           | CharlieDigital wrote:
           | > Multi-repository pull requests should be a first class
           | feature of any serious source code management system.
           | 
           | But it's currently not?                   > If you start two
           | projects separately and then later find out you need to
           | combine their history and work with them as if they were one
           | repository, you shouldn't be forced to restructure the
           | repositories.
           | 
           | It's called a directory copy. Cut + paste. I'd add a tag with
           | a comment pointing to the old repo (if needed). But probably
           | after a few weeks, no one is going to look at the old repo.
        
             | dmazzoni wrote:
             | > It's called a directory copy. Cut + paste. I'd add a tag
             | with a comment pointing to the old repo (if needed). But
             | probably after a few weeks, no one is going to look at the
             | old repo.
             | 
             | Not in my experience. I use "git blame" all the time, and
             | routinely read through commits from many years ago in order
             | to understand why a particular method works the way it
             | does.
             | 
             | Luckily, there are many tools for merging git repos into
             | each other while preserving history. It's not as simple as
             | copy and paste, but it's worth the extra efford.
        
         | danudey wrote:
         | I prefer microservices/microrepos _conceptually_, but we had
         | the same experience as your quoted text - making changes to
         | four repos, and backporting those changes to the previous two
         | release branches, means twelve separate PRs to make a change.
         | 
         | Having a centralized configuration library (a shared Makefile
         | that we can pull down into our repo and include into the local
         | Makefile) helps, until you have to make a backwards-
         | incompatible change to that Makefile and then post PRs to every
         | branch of every repo that uses that Makefile.
         | 
         | Now we have almost the entirety of our projects back into one
         | repository and everything is simpler; one PR per release
         | branch, three PRs (typically) for any change that needs
         | backporting. Vastly simpler process and much less room for
         | error.
        
         | wongarsu wrote:
         | It's not _as much_ of a pain if your tooling supports git repos
         | as dependencies. For example a typical multi-repo PR for us
         | with rust is 1) PR against library 2) PR against application
         | that points dependency to PR 's branch, makes changes 3) PR
         | review 4) PR 1 is approved and merged 5) PR 2 is changed to
         | point to new master branch of commit 6) PR 2 is approved and
         | merged
         | 
         | Same idea if you use some kind of versioning and release
         | system. It's still a bit of a pain with all the PRs and
         | coordination involved, but at every step every branch is
         | consistent and buildable, you just check it out and hit build.
         | 
         | This is obviously more difficult if you have a more loosely
         | coupled architecture like microservices. But that's self-
         | inflicted pain
        
         | ericyd wrote:
         | I felt the same, the author seemed to downplay the success
         | while every effect listed in the article felt like a huge
         | improvement.
        
         | eikenberry wrote:
         | I thought one of the whole points behind separate (non-
         | mono)repos was to help enforce loose coupling and if you came
         | to a point where a single feature change required PRs on 4
         | separate repos then that was an indicator that your project
         | needed refactoring as it was becoming to tightly coupled. The
         | example in the article could have been interpreted to mean that
         | they should refactor the functionality for interacting with the
         | ML model into it's own repo so it could encapsulate this aspect
         | of the project. Instead they doubled down on the tighter
         | coupling by putting them in a monorepo (which itself encourages
         | tighter coupling).
        
         | Attummm wrote:
         | The issue you faced stemmed from the previous best practice of
         | "everything in its own repository." This approach caused major
         | issues. Such as versioning challenges and data model
         | inconsistencies you mentioned. The situations it could lead to
         | are comedy sketches, but it's a real pain especially when
         | you're part of a team struggling with these problems. And it's
         | almost impossible to convince a team to change direction once
         | they've committed to it.
         | 
         | Now, though, it seems the pendulum has swung in the opposite
         | direction, from "everything in its own repo" to "everything in
         | one repo." This, too, will create its own set of problems,
         | which also can be comedic, but frustrating to experience. For
         | instance, what happens when someone accidentally pushes a
         | certificate or API key and you need to force an update
         | upstream? Coordinating that with 50 developers spread across 8
         | projects, all in a single repo.
         | 
         | Instead we could also face the problems we currently face and
         | start out wirn a balanced approach. Start with one repository,
         | or split frontend and backend if needed. For data pipelines
         | that share models with the API, keep them in the same
         | repository, creating a single source of truth for the data
         | model. This method has often led to other developers telling me
         | about the supposed benefits of "everything in its own repo."
         | Just as I pushed back then, I feel the need to push back now
         | against the monorepo trend.
         | 
         | The same can be said for monoliths and microservices, where the
         | middle ground is often overlooked in discussions about best
         | practices.
         | 
         | They all reminded me of the concept of "no silver bullet"[0].
         | Any decision will face its own unique challenges. But silver
         | bullet solution can create artificial challenges that are
         | wasteful, painful, and most of all unnecessary.
         | 
         | [0]https://en.m.wikipedia.org/wiki/No_Silver_Bullet
        
       | memsom wrote:
       | monorepos are appropriate for a single project with many sub
       | parts but one or two artifacts on any given release build. But
       | they fall apart when you have multiple products in the monorepo,
       | each with different release schedules.
       | 
       | As soon as you add a second separate product that uses a
       | different subset of any code in the repo, you should consider
       | breaking up the monorepo. If the code is "a bunch of libraries"
       | and "one or more end user products" it becomes even more
       | imperative to consider breaking down stuff..
       | 
       | Having worked on monorepos where there are 30+ artifacts,
       | multiple ongoing projects that each pull the monorepo in to
       | different incompatible versions, and all of which have their own
       | lifetime and their own release cycle - monorepo is the antithesis
       | of a good idea.
        
         | munksbeer wrote:
         | No offense but I think you're doing monorepos wrong. We have
         | more than 100 applications living in our monorepo. They share
         | common core code, some common signals, common utility libs, and
         | all of them share the same build.
         | 
         | We release everything weekly, and some things much more
         | frequently.
         | 
         | If your testing is good enough, I don't see what the issue is?
        
           | bluGill wrote:
           | > If your testing is good enough, I don't see what the issue
           | is?
           | 
           | Your testing isn't good enough. I don't know who you are,
           | what you are working on, or how much testing you do, but I
           | will state with confidence it isn't good enough.
           | 
           | It might be acceptable for your current needs, but you will
           | have bugs that escape testing - often intentional as you
           | can't stop forever to fix all known bugs. In turn that means
           | if anything changes in your current needs you will run into
           | issues.
           | 
           | > We release everything weekly, and some things much more
           | frequently.
           | 
           | This is a negative to users. When you think you will release
           | again next so who cares about bugs it means your users see
           | more bugs. Sure it is nice that you don't have to break open
           | years old code anymore, but if the new stuff doesn't have
           | anything the user wants is this really a good thing?
        
           | memsom wrote:
           | No offence, but you might be a little confused by how complex
           | your actual delivery is. That sounds simple. That sounds like
           | it has a clear roadmap. When you don't, and you have very
           | agile development that pivots quickly and demands a lot of
           | change concurrently for releases that have very different
           | goals, it is not possible to make all your ducks sit in a
           | row. Monorepos suck in that situation. The dependency graph
           | is so complex it will make your head hurt. And all the
           | streams need to converge in to the main dev branch at some
           | point, which causes huge bottlenecks.
        
             | tomtheelder wrote:
             | The dependency graph is no different for a monorepo vs a
             | polyrepo. It's just a question of how those dependencies
             | get resolved.
        
         | vander_elst wrote:
         | Working on a monorepo where we have hundreds (possibly
         | thousands) of projects each with a different version and
         | release schedule. It actually works quite well, the
         | dependencies are always in a good state, it's easy to see the
         | ramifications of a change and to reuse common components.
        
           | memsom wrote:
           | Good for you. For us, because we have multiple projects going
           | on, pulling the code in different ways, code that runs on
           | embedded, code that runs in the cloud, desktop apps (real
           | ones written in C++ and .Net, not glorified web apps), code
           | that is customer facing, code used by third parties for
           | integrating our products, no - it just doesn't work. The
           | embedded shares a core with other levels, and we support
           | multiple embedded platforms (bare metal) and OS (Windows,
           | Linux, Android, iOS) and also have stuff that runs in
           | Amazon/Azure cloud platform. You might be fine, but when you
           | hit critical mass and you have very complicated commercial
           | concerns, it doesn't work well.
        
             | tomtheelder wrote:
             | I mean it works for Google. Not saying that's a reason to
             | go monorepo, but it at least suggests that it can work for
             | a very large org with very diverse software.
             | 
             | I really don't see why anything you describe would be an
             | issue at all for a monorepo.
        
       | h1fra wrote:
       | I think the big issue around monorepo is when a company puts
       | completely different projects together inside a single repo.
       | 
       | In this article almost everything makes sense to me (because
       | that's what I have been doing most of my career) but they put
       | their OTP app inside which suddenly makes no sense. And you can
       | see the problem in the CI they have dedicated files just for this
       | App and probably very few common code with the rest.
       | 
       | IMO you should have one monorepo per project (api, frontend,
       | backend, mobile, etc. as long as it's the same project) and if
       | needed a dedicated repo for a shared library.
        
         | fragmede wrote:
         | > you should have one monorepo per project (api, frontend,
         | backend, mobile, etc. as long as it's the same project)
         | 
         |  _that 's not a monorepo!_
         | 
         | Unless the singular "project" is stuff our company ships, the
         | problem you have is of impedance mismatch between the projects,
         | which is the problem that an _actual_ monorepo solves. for swe
         | 's on individual projects who will never have the problem of
         | having to ship a commit on all the repos at the "same" time,
         | yeah that seems fine, and for them it is. the problem comes as
         | a distributed systems engineer where, for whatever reason, many
         | or all the repos need to be shipped at the ~same time. or worse
         | - A needs to ship before B which needs ship before C but that
         | needs to ship before A, and you have to unwind that before
         | actually being able to ship the change.
        
           | hk1337 wrote:
           | > that's not a monorepo!
           | 
           | Sure it is! It's just not the ideal use case for a monorepo
           | which is why people say they don't like monorepos.
        
             | vander_elst wrote:
             | "one monorepo per project (api, frontend, backend, mobile,
             | etc. as long as it's the same project) and if needed a
             | dedicated repo for a shared library."
             | 
             | They are literally saying that multiple repos should be
             | used, also for sharing the code, this is not monorepo,
             | these are different repos.
        
       | gregmac wrote:
       | To me, monorepo vs multi-repo is not about the code organization,
       | but about the deployment strategy. My rule is that there should
       | be a 1:1 relation between a repository and a release/deployment.
       | 
       | If you do one big monolithic deploy, one big monorepo is ideal.
       | (Also, to be clear, this is separate from microservice vs
       | monolithic app: your monolithic deploy can be made up of as many
       | different applications/services/lambdas/databases as makes
       | sense). You don't have to worry about cross-compatibility between
       | parts of your code, because there's never a state where you can
       | deploy something incompatible, because it all deploys at once. A
       | single PR makes all the changes in one shot.
       | 
       | The other rule I have is that if you want to have individual
       | repos with individual deployments, they must be both forward- and
       | backwards-compatible for long enough that you never need to do a
       | coordinated deploy (deploying two at once, where everything is
       | broken in between). If you have to do coordinated deploys, you
       | really have a monolith that's just masquerading as something more
       | sophisticated, and you've given up the biggest benefits of _both_
       | models (simplicity of mono, independence of multi).
       | 
       | Consider what happens with a monorepo with parts of it being
       | deployed individually. You can't checkout any specific commit and
       | mirror what's in production. You could make multiple copies of
       | the repo, checkout a different commit on each one, then try to
       | keep in mind which part of which commit is where -- but this is
       | utterly confusing. If you have 5 deployments, you now have 4
       | copies of any given line of code on your system that are
       | potentially wrong. It becomes very hard to not accidentally break
       | compatibility.
       | 
       | TL;DR: Figure out your deployment strategy, then make your
       | repository structure mirror that.
        
         | aswerty wrote:
         | This mirrors my own experience in the SaaS world. Anytime
         | things move towards multiple artifacts/pipelines in one repo;
         | trying to understand what change existed where and when seems
         | to always become very difficult.
         | 
         | Of course the multirepo approach means you do this dance a lot
         | more: - Create a change with backwards compatibility and
         | tombstones (e.g. logs for when backward compatibility is used)
         | - Update upstream systems to the new change - Remove backwards
         | compatibility and pray you don't have a low frequency upstream
         | service interaction you didn't know about
         | 
         | While the dance can be a pain - it does follow a more iterative
         | approach with reduced blast radiuses (albeit many more of
         | them). But, all in all, an acceptable tradeoff.
         | 
         | Maybe if I had more familiarity in mature tooling around
         | monorepos I might be more interested in them. But alas not a
         | bridge I have crossed, or am pushed to do so just at the
         | moment.
        
         | CharlieDigital wrote:
         | It doesn't have to be that way.
         | 
         | You can have a mono-repo and deploy different parts of the repo
         | as different services.
         | 
         | You can have a mono-repo with a React SPA and a backend service
         | in Go. If you fix some UI bug with a button in the React SPA,
         | why would you also deploy the backend?
        
           | oneplane wrote:
           | You wouldn't, but making a repo collection into a mono-repo
           | means your mono-deploy needs to be split into a multi-maybe-
           | deploy.
           | 
           | As always, complexity merely moves around when squeezed, and
           | making commits/PRs easier means something else, somewhere
           | else gets less easy.
           | 
           | It is something that can be made better of course, having
           | your CI and CD be a bit smarter and more modular means you
           | can now do selective builds based on what was actually
           | changed, and selective releases based on what you actually
           | want to release (not merely what was in the repo at a commit,
           | or whatever was built).
           | 
           | But all of that needs to be constructed too, just merging
           | some repos into one doesn't do that.
        
             | CharlieDigital wrote:
             | This is not very complex at all.
             | 
             | I linked an example below. Most CI/CD, like GitHub
             | Actions[0], can easily be configured to trigger on changes
             | for files in a specific path.
             | 
             | As a very basic starting point, you only need to set up
             | simple rules to detect which monorepo roots changed.
             | 
             | [0] https://docs.github.com/en/actions/writing-
             | workflows/workflo...
        
           | bryanlarsen wrote:
           | If you don't deploy in tandem, you need to test forwards &
           | backwards compatibility. That's tough with either a monorepo
           | or separate repos, but arguably it'd be simple with separate
           | repos.
        
             | CharlieDigital wrote:
             | It doesn't have to be that complicated.
             | 
             | All you need to know is "does changing this code affect
             | that code".
             | 
             | In the example I've given -- a React SPA and Go backend --
             | let's assume that there's a gRPC binding originating from
             | the backend. How do we know that we also need to deploy the
             | SPA? Updating the schema would cause generation of a new
             | client + model in the SPA. Now you know that you need to
             | deploy both and this can be done simply by detecting roots
             | for modified files.
             | 
             | You can scale this. If that gRPC change affected some other
             | web extension project, apply the same basic principle:
             | detect that a file changed under this root -> trigger the
             | workflow that rebuilds, tests, and deploys from this root.
        
           | Falimonda wrote:
           | This is spot on. A monorepo can still include a granular and
           | standardized CI configuration across code paths. Nothing
           | about monorepo forces you to perform a singular deployment.
           | 
           | The gains provided by moving from polyrepo to monorepo are
           | immense.
           | 
           | Developer access control is the only thing I can think to
           | justify polyrepo.
           | 
           | I'm curious if and how others who see the advantages of
           | monorepo have justified polyrepo in spite of that.
        
       | syndicatedjelly wrote:
       | Some thoughts:
       | 
       | 1) Comparing a photo storage app to the Linux kernel doesn't make
       | much sense. Just because a much bigger project in an entirely
       | different (and more complex) domain uses monorepos, doesn't mean
       | you should too.
       | 
       | 2) What the hell is a monorepo? I feel dumb for asking the
       | question, and I feel like I missed the boat on understanding it,
       | because no one defines it anymore. Yet I feel like every mention
       | of monorepo is highly dependent on the context the word is used
       | in. Does it just mean a single version-controlled repository of
       | code?
       | 
       | 3) Can these issues with sync'ing repos be solved with better use
       | of `git submodule`? It seems to be designed exactly for this
       | purpose. The author says "submodules are irritating" a couple
       | times, but doesn't explain what exactly is wrong with them. They
       | seem like a great solution to me, but I also only recently
       | started using them in a side project
        
         | datadrivenangel wrote:
         | Monorepo is just a single repo. Yup.
         | 
         | Git submodules have some places where you can surprisingly lose
         | branches/stashed changes.
        
           | syndicatedjelly wrote:
           | One of my repos has a dependency on another repo (that I also
           | own). I initialized it as a git submodule (e.g. my_org/repo1
           | has a submodule of my_org/repo2).                   Git
           | submodules have some places where you can surprisingly lose
           | branches/stashed changes.
           | 
           | This concerns me, as git generally behaves as a leak-proof
           | abstraction in my experience. Can you elaborate or share
           | where I can learn more about this issue?
        
         | klooney wrote:
         | > Does it just mean a single version-controlled repository of
         | code?
         | 
         | Yeah- they idea is that all of your projects share a common
         | repo. This has advantages and drawbacks. Google is most famous
         | for this approach, although I think they technically have three
         | now- one for Google, one for Android, and one for Chrome.
         | 
         | > They seem like a great solution to me
         | 
         | They don't work in a team context because they're extra steps
         | that people don't do, basically. And did some reason a lot of
         | people find them confusing.
        
           | nonameiguess wrote:
           | https://github.com/google/ contains 2700+ repositories. I
           | don't know necessarily how many of these are read-only clones
           | from an internal monorepo versus how many are separate
           | projects that have actually been open-sourced, but the latter
           | is more than zero.
        
       | mgaunard wrote:
       | Doing modular right is harder than doing monolithic right.
       | 
       | But if you do it right, the advantage you get is that you get to
       | pick which versions of your dependencies you use; while quite
       | often you just want to use the latest, being able to pin is also
       | very useful.
        
         | lukewink wrote:
         | You can still publish packages and pull them down as (pinned)
         | dependencies all within a monorepo.
        
           | mgaunard wrote:
           | that's a terrible and arguably broken-by-design workflow
           | which entirely defeats the point of the monorepo, which is to
           | have a unified build of everything together, rather than
           | building things piecemeal in ways that could be incompatible.
           | 
           | For C++ in particular, you need to express your dependencies
           | in terms of source versions, and ensure all of the build
           | artifacts you link together were built against the same
           | source version of every transitive dependency and with the
           | same flags. Failure to do that results in undefined
           | behaviour, and indeed I have seen large organizations with
           | unreliable builds as a manner of routine because of that.
           | 
           | The best way to achieve that is to just build the whole thing
           | from source, with a content-addressable-store shared with the
           | whole organization to transparently avoid building redundant
           | things. Whether your source is in a single repo or spread
           | over several doesn't matter so long as your tooling manages
           | that for you and knows where to get things, but ultimately
           | the right way to do modular is simply to synthesize the
           | equivalent monorepo and build that. Sometimes there is the
           | requirement that specific sources should have restricted
           | access, which is often a reason why people avoid building
           | from source, but that's easy to work around by building on
           | remote agents.
           | 
           | Now for some reason there is no good open-source build system
           | for C++, while Rust mostly got it right on the first try.
           | Maybe it's because there are some C++ users still attached to
           | the notion of manually managing ABI.
        
       | stackskipton wrote:
       | As DevOps/SRE type person that occasionally gets stuck with
       | builds, Monorepos world well if company will invest in the build
       | process. However, many companies don't do well in this area and
       | Monorepo blast radius becomes much bigger so individual repos it
       | is. Also, depending on the language, building private repo is
       | easy enough to keep all common libraries in.
        
       | stillbourne wrote:
       | I like to use the monorepo tools without the monorepo repo. If
       | that makes any god damn sense. I use NX at my job and the
       | monorepo was getting out of hand, 6 hour pipeline builds, 2 hours
       | testing, etc. So I broke the repo into smaller pieces. This
       | wouldn't have been possible if I wasn't already using the
       | monorepo tools universally through the project but it ended up
       | working well.
        
       | KaiserPro wrote:
       | Monorepos have their advantages, as pointed out, one place to
       | review, one place to merge.
       | 
       | But it can also breed instability, as you can upgrade other
       | people's stuff without them being aware.
       | 
       | There are ways around this, which involve having a local module
       | store, and building with named versions. Very similar to a bunch
       | of disparate repos, but without getting lost in github (github's
       | discoverability was always far inferior to gitlab)
       | 
       | However it has its draw backs namely that people can hold out on
       | older versions than you want to support.
        
         | dkarl wrote:
         | > But it can also breed instability, as you can upgrade other
         | people's stuff without them being aware
         | 
         | This is why Google embraced the principle that if somebody
         | breaks your code without breaking your tests, it's your fault
         | for not writing better tests. (This is sometimes known as the
         | Beyonce rule: if you liked it, you should have put a test on
         | it.)
         | 
         | You need the ability to upgrade dependencies in a hands-off way
         | even if you don't have a monorepo, though, because you need to
         | be able to apply security updates without scheduling dev work
         | every time. You shouldn't need a careful informed eye to tell
         | if upgrades broke your code. You should be able to trust your
         | tests.
        
       | msoad wrote:
       | I love monorepos but I'm not sure if Git is the right tool beyond
       | certain scale. Where I work doing a simple `git status` takes
       | seconds due to the size of the repo. There has been various
       | attempts to solve Git performance but so far this is nothing
       | close to what I experienced at Google.
       | 
       | The Git team should really invest in tooling for very large
       | repos. Our repo is around 10M files and 100M lines of code and no
       | amount of hacks on top of Git (cache, sparse checkout etc etc) is
       | not really solving the core problem.
       | 
       | Meta and Google have really solved this problem internally but
       | there is no real open source solution that works for everyone out
       | there.
        
         | dijit wrote:
         | I'm secretly hoping that google releases piper (and Mondrian);
         | the gaming industry would go wild.
         | 
         | Perforce is pretty brutal, and the code review tools are awful
         | - but its still the undisputed king of mixed text and binary
         | assets in a huge monorepo.
        
       | paxys wrote:
       | All the pitfalls of a monorepo can disappear with some good
       | tooling and regular maintenance, so much so that devs may not
       | even realize that they are using one. The actual meat of the
       | discussion is - should you deploy the entire monorepo as one unit
       | or as multiple (micro)services?
        
       | bobim wrote:
       | Started to use a monorepo + worktrees to keep related but
       | separated developments all together with different checkouts.
       | Anybody else on the same path?
        
       | __MatrixMan__ wrote:
       | Every monorepo I've ever met (n=3) has some kind of radioactive
       | DMZ that everybody is afraid to touch because it's not clear who
       | owns it but it is clear from its quality that you don't want to
       | be the last person who touched it because then maybe somebody
       | will think that you own it. It's usually called "core" or
       | somesuch.
       | 
       | Separate repos for each team means that when two teams own
       | components that need to interact, they have to expose a "public"
       | interface to the other team--which is the kind of disciplined
       | engineering work that we should be striving for. The monorepo-
       | alternative is that you solve it in the DMZ where it feels less
       | like engineering and more like some kind of multiparty political
       | endeavor where PR reviewers of dubious stakeholder status are
       | using the exercise to further agendas which are unrelated to the
       | feature except that it somehow proves them right about whatever
       | architectural point is recently contentious.
       | 
       | Plus, it's always harder to remove something from the DMZ than to
       | add it, so it's always growing and there's this sort of
       | gravitational attractor which, eventually starts warping time
       | such that PR's take longer to merge the closer they are to it.
       | 
       | Better to just do the "hard" work of maintaining versioned
       | interfaces with documented compatibility defined by tests. You
       | can always decide to collapse your codebase into a black hole
       | later--but if you start on that path you may never escape.
        
       ___________________________________________________________________
       (page generated 2024-11-06 23:00 UTC)