[HN Gopher] Stripe's Monorepo Developer Environment
___________________________________________________________________
Stripe's Monorepo Developer Environment
Author : edran
Score : 339 points
Date : 2024-08-15 18:22 UTC (4 days ago)
(HTM) web link (blog.nelhage.com)
(TXT) w3m dump (blog.nelhage.com)
| domenkozar wrote:
| We've been building https://devenv.sh for that reason, I expect
| more companies to go back to local development once they see DX
| has improved locally.
| evnix wrote:
| How is this better or different from tools like dev which use
| docker
| drakerossman wrote:
| It (obviously) leverages Nix, which in turn means the
| environment is declarative and fully reproducible (not
| "reproducible" as in docker). Now, you can use just Nix's
| devShells, but with devenv you have a middleground between
| just Nix package manager and a full fledged NixOS module
| system. Basically, write out one line of code - and you've
| got your Postgres, another one - full linter set up for
| whatever language you're using, etc.
| tmerse wrote:
| Can I also get the security/isolation benefits that a duly
| configured docker/podman can provide (container can only
| act on mounted volume, non-root user, other seccomp
| settings?).
|
| I feel better doing my "npm install"s in such an
| environment (of course it's still not a VM - but that's
| another topic).
|
| When I read about nix, reproducibility is a goal, but
| security/isolation is a non-goal.
| ParetoOptimal wrote:
| You can generate fully reproducible OCI/docker containers
| with devenv, so yes I think.
|
| https://devenv.sh/containers/
| pxc wrote:
| > When I read about nix, reproducibility is a goal, but
| [...] isolation is a non-goal.
|
| Generally, yes.
|
| But you can use or put together something like this to
| run Nix inside a devcontainer instead of locally:
| https://github.com/xtruder/nix-devcontainer
|
| So you can use them in conjunction (or alternation, if
| for some projects you're okay running without a
| container) without having to specify your development
| environments twice.
|
| > I feel better doing my "npm install"s in such an
| environment (of course it's still not a VM - but that's
| another topic).
|
| There are basically two kinds of integration you can do
| for a project with Nix, which I'll call deep and shallow.
| In shallow integration, you just have Nix provide the
| toolchain and then you build the project (manually, with
| a script, with a Makefile, whatever). This is pretty
| common and pretty easy, and gives you no protection from
| malicious NPM build scripts.
|
| For deep integration, you can actually have Nix build
| your whole project. This has some downsides, like that it
| can't really handle incremental builds. It also imposes
| restrictions, like no network access by anything but Nix
| at build time, all packages are built by special build
| users with no homedirs and no perms to access anything,
| etc. When you do that kind of build/install, you do get
| some protection from crypto miners lurking in the NPM
| registry or PyPI or whatever.
| stavros wrote:
| Nix is the right tool for this, developing a tool to make Nix's
| UX easier is a great idea. Thanks for this!
| dirtbag__dad wrote:
| What about dev containers?
| stavros wrote:
| You mean Docker? They tend to rot much more than I'd like,
| mostly because you forget to pin something at some point.
| With Nix, you can't forget.
| janjongboom wrote:
| FYI, I've helped set up StableBuild
| (https://www.stablebuild.com) to help pin stuff in Docker
| that's normally virtually impossible to pin (e.g. OS
| package repos, Docker base images, random files from the
| internet, etc.)
| 1oooqooq wrote:
| did the word rot change meaning recently?
|
| pin is what causes rot, not what solves it.
| otabdeveloper4 wrote:
| Good luck with your Docker containers in three years.
| (You're gonna need it.)
| 0x457 wrote:
| Different kind of rot. With nix and flakes, I can come
| back to a project 5 years later and as long as external
| dependencies (i.e. package sources) still available it
| will bring me back straight to that environment like it
| was yesterday.
|
| If you have a Dockerfile from 5 years ago...well good
| luck building it today.
| ParetoOptimal wrote:
| You can create them with devenv, but they are actually
| reproducible :
|
| https://devenv.sh/containers/
|
| https://devenv.sh/integrations/codespaces-devcontainer/
| 1oooqooq wrote:
| i missed any description of the actual container content
| on those examples.
| 0x457 wrote:
| IIRC, it uses what is defined for shell environment. Just
| instead of activating on your machine, it produces OCI
| image with that environment.
|
| I have nixOS definitions that I can use to make a SD card
| image, overtake a running linux system via ssh, deploy to
| nixos via ssh, or deploy to a local system - all from one
| definition.
| pxc wrote:
| Containers are a great deployment target, but they're not
| really a great development environment for a few reasons
| (e.g., they're Linux-specific, so they require extra
| virtualization on non-Linux operating systems, the kind of
| isolation they provide is more of a hindrance than a help
| when it comes to working on your local filesystem, and for
| them to be useful you have to set up infrastructure to push
| and pull your private containers to and from).
|
| Nix is a better fit for this, and when you're using Nix you
| can also have Nix generated containers for deployment. I
| think you can also use a container with Nix in to provide
| the devcontainers interface to devs who don't have Nix
| installed locally, and have it in turn use Nix against your
| project's flake to set up its environment.
| eadmund wrote:
| > Nix is the right tool for this
|
| Or Guix, which has the advantage of a more pleasant language.
| earthling8118 wrote:
| The language isn't the problem with nix.
| 0x457 wrote:
| It's not "the problem", but it's a problem. It's better
| than alternatives, but it's hacky nature shows.
| ParetoOptimal wrote:
| Well, if you believe:
|
| - discoverability is a problem in Nix
|
| - Guix encourages or "shepherds" more discoverable
| functions, modules, and abstractions
|
| Then the language could be a problem.
| nitsky wrote:
| What is?
| chpatrick wrote:
| YMMV, I really don't like lisp braces personally.
| otabdeveloper4 wrote:
| That's, like, just your opinion, man.
|
| Scheme and/or Lisp is literally the worst language choice
| for this problem domain.
| 0x457 wrote:
| I wouldn't say it's the worst. I don't like Lisp and co,
| but I think it's alright for this. I don't like Guix for
| a very different reason.
| ParetoOptimal wrote:
| Why is Lisp the worst for this problem domain?
|
| I will admit I find guix to be much more verbose than
| Nix.
| pxc wrote:
| At my workplace, Guix's lack of macOS support takes away
| some of the benefit of using something like Nix or Guix as
| opposed to HVM solutions like Docker Desktop or Vagrant. I
| imagine this situation is unfortunately common.
|
| For teams where GNU/Linux is the primary development OS,
| Guix seems like a great choice.
| pxc wrote:
| My small team uses devenv for all our development environments
| and we really like it. Local DX is really important to me and
| to our team, which is a big part of why we've chosen Nix and
| devenv.
|
| As we've started to use it more extensively, we've also found
| that we want to add some enhancements, work out some bugs, and
| experiment with our own customizations out-of-tree, etc. I'm
| happy to report here on HN that devenv is well-documented and
| easy to extend for Nix users who have some experience with Nix
| module systems, and that Domen is really responsive to PRs. :)
| reillys wrote:
| I chatted to Nelson when I was designing brisk
| (https://github.com/brisktest/brisk) and his insight informed the
| development of it.
|
| Among other things, Brisk allows you to run tests for your local
| code changes in the cloud (basically the pay mini test piece but
| for any test runner)
|
| We also have a sync step much like the one described here and
| allow users to run one off commands (linters, tsc etc)
| IshKebab wrote:
| Can't you achieve all that just using a build system with
| reliable remote builds & caching e.g. Bazel, Buck, Please, etc?
|
| That also avoids hacky sync scripts.
| reillys wrote:
| No you can't.
|
| They don't work from your local development env and also work
| in your CI env.
|
| Mostly Brisk was designed to run your complete test suite on
| every codes save (ie local save) but it also works great from
| your CI.
|
| We can run entire test suites in seconds which is performance
| you don't get with those systems you named (which are
| generally for building/compiling)
| reillys wrote:
| To be clear the sync step is used for the test suite
| execution not only the one off command running - it's just
| something we can also easily do because we have a hot env
| in the cloud
| joshuamorton wrote:
| > They don't work from your local development env and also
| work in your CI env.
|
| This is one of the biggest selling points of bazel-like
| build systems. Like to the extent that, for some changes,
| bazel can say "even though you changed this source file, I
| can be 100% certain that that change didn't affect any
| tests and so I will not run them"
| Maxious wrote:
| I'd suggest you revise your competitor analysis. Bazel
| definitely has a test command that with remote execution
| and caching absolutely allows you to run entire test suites
| in seconds* both locally and in CI eg.
| https://blog.aspect.build/typescript-with-rbe
| reillys wrote:
| This blog post says 2 and a half minutes not seconds.
|
| I know Bazel is a build system which distributes builds
| among remote machines.
|
| In fact using any computer language you can achieve these
| goals - you just need to program it.
|
| So yes you could probably do all the things with all the
| things, but Basel does not solve this problem out of the
| box.
|
| I wonder why stripe didn't "just use Bazel".
| IshKebab wrote:
| > This blog post says 2 and a half minutes not seconds.
|
| It's meaningless to say "we can run tests in seconds".
| You can't run _my_ tests in seconds because they 're
| single threaded and take 10 minutes. The important thing
| is the speedup, and they got a pretty good speedup.
| Arguably the nop build/test time is important too but it
| doesn't look like they measured that.
|
| > Basel does not solve this problem out of the box.
|
| Yes it does.
|
| > I wonder why stripe didn't "just use Bazel".
|
| In my experience it's because setting up Bazel is a) more
| work than setting up some ad-hoc build system (Make or
| CMake or whatever) and b) difficult to switch to
| retrospectively. So it only gets used where you have
| people who are experienced enough to know that you _will_
| wish you had started with it, and can convince the
| inexperienced people that it 's worth the effort.
|
| Usually you get too many inexperienced people saying
| "it's too difficult; we'll be fine with Make".
| zrail wrote:
| First release of Basel was in 2015 when Stripe was
| already 5 years old and the progenitor of this tooling
| was already running with several dozen users.
| jvolkman wrote:
| Stripe does use Bazel.
|
| https://stripe.com/blog/fast-secure-builds-choose-two
| nkohari wrote:
| Stripe does use Bazel. It just didn't exist before Stripe
| built some of its own internal systems, but it's
| gradually replacing ~everything from a build standpoint.
|
| The one thing to know about Bazel is that it's both
| incredibly impressive, and also one of the least
| ergonomic pieces of software ever created. It's very
| clearly an internal project which was cleaned up and open
| sourced without any attempt to make it more usable
| outside of Google.
|
| Bazel's kind of like Kubernetes in a way -- you don't
| actually get enough benefits to adopt it until you're at
| a certain point in the company lifecycle, and to get to
| that point you usually have to build other systems first.
| Then you have to gradually replace those systems with
| Bazel.
| IshKebab wrote:
| > They don't work from your local development env and also
| work in your CI env.
|
| Err yes they do? Unless you mean something really specific
| that I'm not getting?
| riffraff wrote:
| > Brisk allows you to run tests for your local code changes in
| the cloud
|
| how does this work for interactive debugging?
|
| I was going to ask the same about the system in TFA but I might
| as well ask you :)
| KolmogorovComp wrote:
| > In addition, Stripe's monorepo was (to our knowledge) the
| largest Ruby codebase in existence
|
| Bigger than shoppify's?
| Macha wrote:
| So from a gut feeling that sounds right, finance is a pretty
| complicated domain with a lot of per vendor interactions, and
| Shopify outsources their payment stuff to Stripe.
|
| Also on a headcount level, Google tells me Shopify has 3,500
| employees to Stripe's 9,500. Obviously neither company is
| compromised entirely of engineers, so this is a ballpark
| estimate.
|
| GitHub feels like the real case where there might be a larger
| codebase. It's in the middle for employees (6,500), but it's
| existed longer than Stripe (though not as much longer as my gut
| feeling told me, interestingly)
| spacemonkey92 wrote:
| I also wonder how they handle merge requests in a monorepo,
| especially when it comes to the code review process.
| azthecx wrote:
| Typically you have owner files or similar in the subprojects
| that are read by automation tooling and humans alike
| popinman322 wrote:
| It's possible to get stuck in merge hell where all your
| reviewers ok the PR but someone merged a conflict 2 seconds
| ago, or you've got a reviewer in Singapore while you're in SF
| and conflicts appeared overnight.
|
| In general it was pretty rare, in my experience. The code
| bases were pretty well modularized.
| shepwalker wrote:
| Hi! I work at Stripe on this. What're you curious about?
| froydnj wrote:
| The most recent publicly available numbers (that I know of,
| maybe there's a talk available somewhere that's more recent)
| are from https://stripe.com/blog/sorbet-stripes-type-checker-
| for-ruby
|
| > currently amounting to over 15 million lines of code spread
| across 150,000 files
|
| The monorepo has only gotten bigger over the last two years
| (source: I work at Stripe).
| froydnj wrote:
| I should also note that number is Ruby files only.
| rvz wrote:
| This isn't recommended practice really and there is nothing about
| this which justifies having to maintain huge code bases in a
| single folder or multiple folders in one larger one.
|
| Won't be surprised to see that many would probably need a safari
| map or README documentation in every single folder to navigate a
| repository as large as stripes.
|
| Sounds like an emergence of a new bad practice if you are having
| to praise how large your code base is.
| pavlov wrote:
| Meta also has a massive monorepo accessed primarily through
| cloud devservers.
|
| When several of the world's most successful software companies
| use this approach, it's hard to argue that it's inherently bad.
| Of course it's sensible to discuss what lessons apply to
| smaller companies who don't have the luxury of dedicated
| tooling teams supporting the monorepo and dev environment.
| n_ary wrote:
| Just because some successful companies use some approach
| doesn't make it the best practice. I have seen firsthand
| nuisance of monorepo, which took almost 15minutes to
| correctly switch branches on intel machines(and decently
| spiked the CPU by causing windows defender to panic). It has
| decent benefit of easy code sharing, but build and test are
| soul sucking experiences and if someone decides to run some
| updated formatter and linter rule accidentally, the whole MR
| becomes a nightmare to correctly review(once had a 2k+
| changes and had to request to rollback and then only commit
| what they actually wanted to change).
| aidos wrote:
| Why would you feel obliged to accept a MR in which someone
| has accidentally changed large amounts of code?
| tail_exchange wrote:
| > took almost 15minutes to correctly switch branches on
| intel machines
|
| This can probably be fixed with trivial tuning. Just
| configuring Git to fetch only your branches would speed up
| the branch switching significantly.
|
| > build and test are soul sucking experiences
|
| Why? It doesn't have to be. If you are going to build the
| entire monorepo, then yes, but this should only happen when
| you are running CI, and even then you can break down the
| builds into smaller components.
|
| > the whole MR becomes a nightmare to correctly review
|
| Not if you set up code ownership properly. You also need to
| think what happens in case of emergencies, so having a
| selected list of "super users" and users with permissions
| to bypass reviews is important.
|
| It sounds like this company wanted a monorepo, but nobody
| invested any money or time to actually think about
| developer productivity. When this happens, yes, of course
| it won't be good, because no project succeeds like this.
| The nice thing about a monorepo is that instead of 1,000
| repos with tooling all over the place and no specialist to
| take care of them, you can have one repo with really good
| tooling and a team dedicated to just keep it running
| smoothly. But if nobody is actually taking care of the
| monorepo, it will rot just like any other codebase.
| riwsky wrote:
| "Someone autoformatted the whole thing under new settings
| at the same time as introducing a new feature" is hardly a
| monorepo problem. That could be a pain in the ass to review
| even in a single file. But the flip-side, of someone
| cleanly wanting to a do a mass autoformat or autorefactor,
| is much easier in a monorepo than in split repos.
| kccqzy wrote:
| Nothing you describe is inherent to monorepos. Git is slow
| yes but go use hg. Build and test are slow? That's a CI
| problem: you didn't allocate enough resources to the build
| system. Someone ran a formatter accidentally? That's that
| someone's mistake.
| mootoday wrote:
| Meta also uses React and we know what mess that introduced to
| the world...
| ABS wrote:
| very much recommended practice by many with, of course, caveats
| and situations where perfect is the enemy of good, etc, etc
|
| e.g. https://trunkbaseddevelopment.com/monorepos/
| lijok wrote:
| > Won't be surprised to see that many would probably need a
| safari map or README documentation in every single folder to
| navigate a repository as large as stripes.
|
| No different to having thousands of smaller repos instead.
|
| I personally dislike monorepos, for very niche, in-the-weeds
| operational reasons (as an infra person), but their ergonomics
| for DX cannot be understated.
| __jonas wrote:
| The 'ergonomics for DX' benefit is that you can share code
| across projects without having to go down the path of
| creating a package / library pushed to some internal registry
| and pulled by each project right?
|
| Or are there any other aspects to the monorepo architecture
| that make it beneficial for large companies like that?
|
| Just curious, I've never worked in such an environment
| myself.
| dezgeg wrote:
| In addition to what you mentioned, the ability to
| atomically commit to a library and all of its consumers.
| And for a change to a library run the tests of all of its
| consumers as well.
| bastawhiz wrote:
| Every host running a particular commit is running the code
| you think it is. No submodules or internal packages. If you
| updated the Button component in the design system, when
| your commit is deployed, every service that gets deployed
| has the new button now.
| triceratops wrote:
| Dependency versioning is much smoother.
|
| Example: Service A requires version 1.1 of libFoo and
| libFoo 1.1 requires version 0.1 of libBar. But Service A
| also directly uses libBar version 0.2. Now you have a
| conflict.
|
| If libFoo and libBar are internal code stored in a monorepo
| they're automatically version-compatible because there is
| only one version of both.
| oftenwrong wrote:
| To put it in the most general terms: It provides the same
| value that using a VCS has for a project, but applied to
| the entire company.
|
| In a standalone project, would you accept a change that is
| incompatible with other code in the project? For example,
| would you allow a colleague to change a function in a way
| that breaks the call sites? No, you probably would not.
|
| The attitude within monorepo shops is that this level of
| rigour should be applied to the entire company. Nobody
| should be able to make a change anywhere if it would break
| anything elsewhere, or they should only be permitted to do
| so with intention. There are caveats to this, but that is
| the general idea.
| aylmao wrote:
| I'd say there's 4 main advantages, summarizing what other
| comments are saying but also from my own experience:
|
| - atomic PRs. All changes for a migration/feature living in
| one spot makes development much easier, especially when
| dealing with api changes and migrations
|
| - single history. This is useful when debugging. A commit
| can more easily encapsulate the state of "the whole system"
| as opposed to a single part of it. This makes reverting, if
| necessary, easier
|
| - environment consistency. updating the linting tool,
| formatting tool, UI library, etc is never a priority, so
| there's always drift, where an old repo gets stuck with old
| tools, dependencies and an old environment
|
| - not shipping your org chart is easier when everyone can
| see and work work on the whole codebase, as easily as
| possible.
| chrisweekly wrote:
| understated -> overstated
| papruapap wrote:
| imo monorepos are great, but the tooling is not there,
| especially the open-sourced ones. Most companies using
| monorepos have their own tailored tools for it.
| bastawhiz wrote:
| > Won't be surprised to see that many would probably need a
| safari map or README documentation in every single folder
|
| Is...documentation a bad thing?
| Aeolun wrote:
| They decided to keep the code on the local machine, but the
| language server on the remote one. That seems like a recipe for
| inconsistency. You only get relevant results from your language
| server once your code has synced.
| Hackbraten wrote:
| The article mentions that the LSP itself already has baked-in
| support to enable editors to send chunks of unsaved edits to
| the language server (LS) as they happen.
|
| What Stripe's configuration introduced is that they used a
| remote LS instead of the default local LS. Regardless, VS Code
| already defers LSP communication until it feels idle, and
| developers are used to that. So I wouldn't expect a remote LS
| to significantly impact the level of inconsistency that
| developers already accept when using a local LS.
| bastawhiz wrote:
| I was at Stripe until 2022 and inconsistency with the language
| server was never an issue
| aidos wrote:
| Due to the work that this team put in though, right?
|
| The choice to run dev environment far away from the files
| puts you in the position of needing to engineer your way past
| the inconsistency.
| bastawhiz wrote:
| Yes, almost certainly.
|
| On the other hand, there was so much code that running
| everything on your own laptop was essentially out of the
| question. Doing a git pull after a long vacation locked up
| your dev box for a hot minute while it checked all the
| types--doing the same thing on your MacBook would be
| painful at best.
| paxys wrote:
| The code syncs on every keystroke. Consistency isn't an issue
| unless you are having connection issues. And if you are then
| pretty much all development is broken anyways.
| srvaroa wrote:
| "This scale - the scale of devprod, and in turn the scale of the
| overall organization, such that it could afford 10 FTEs on
| tooling - was a major factor in our choices"
|
| Is basically the summary for most mono/multi repo discussions,
| and a bunch of other related ones.
| mhh__ wrote:
| Not sure.
|
| I think a lot of this is just type of thing comes because with
| a monorepo you can actually see the problems to solve whereas
| you can easily end up with the same N engineers firefighting
| the same problems K times across all your polyrepos.
| bluGill wrote:
| You have different problems with both. Some problems are
| hidden in one, but there is no one best answer. (unless your
| project is small/trivial - which is what a lot of them are)
| klodolph wrote:
| Multirepo also comes with cost overhead. I think people talk
| about it somewhat less. I've worked at multirepo and monorepo
| places, both, before. My current company has a multirepo setup
| and it sure seems like it comes with plenty of tooling to fetch
| dependencies. That tooling has to be supported by FTEs.
| hibikir wrote:
| Internally, they definitely do. I worked at Stripe's monorepo
| many years ago, and I am working at a larger company with
| massive amounts of repos. The difference in pain has little
| to do with mono v multi, but with the capabilities of your
| tooling team.
|
| If there's anything I'd say to low-level execs, the kind that
| end up with a few hundred developers under them, it's that
| mis-sizing the tooling team, in one way or the other, comes
| with total productivity penalties that will appear invisible,
| but will make everything expensive. Understanding how much of
| a developer's day is toil is very important, but few really
| try to figure that out.
| aylmao wrote:
| +1. I'd go as far to say that multi-repo probably needs as
| much, if not more effort to properly keep functioning, but
| all that effort is better "hidden" so people assume monorepos
| are more work.
|
| With a monorepo, it's common to have a team focused on
| tooling and maintaining the monorepo. The structure of the
| codebase lends itself to that
|
| With a multirepo codebase, it's usually up tu different teams
| to do the work associated with "multirepo issues"--
| orchestrate releases, handle dependencies, dev environment
| setup, etc. So all that effort just kinda gets "tucked away"
| as overhead that each team assumes, and isn't quite as
| visible
| bluGill wrote:
| It doesn't matter if you have a mono-rep or multi-repo, you
| will need engineers on tooling to make it work if your project
| is large. There are pros and cons to both multi-repo and mono-
| repo with no one right answer (despite what some will tell
| you). They are different pros and cons, but which is best
| depends on your particular context.
| srvaroa wrote:
| Yeah that was my point. In the end both approaches can be
| fine (depends on your context). The real difference is that
| whatever choice you take, it will need the right investment
| in tooling and support.
| bool3max wrote:
| Off-topic but the font on this blog is stunning - after some
| digging it seems to be "Vollkorn".
| delhanty wrote:
| >Some caveats: It's been nearly five years, and I have no doubt
| that I have misremembered some of the specific details, even
| though I'm confident in the overall picture. I'm also certain
| that Stripe has continued evolving and I make no claim this
| document represents the developer experience at Stripe as of
| today.
|
| Are there any more recently ex-Stripe folks here willing and able
| to comment on how Stripe's developer environment might have
| evolved since the OP left in 2019?
| artyom wrote:
| Not ex-Stripe but in "close relationship" with them since its
| inception and there's a clear mark in my calendar circa end of
| 2018 when their decisions and output started to become...
| weird, or ill-designed.
|
| I don't think it has to do with the dev environment itself, but
| I'd blame such thing for allowing to deliver "too fast" without
| thinking twice. Combine that with new blood in management and
| that's an accident waiting to happen *
|
| They're the best in business still, but far from the well-
| designed easy-to-use API-first developer-friendly initial
| offering.
|
| * Pure speculation based on very evident patterns
| rattray wrote:
| Ex-Stripe ('17-'20) here. Agree.
|
| Though I am under the impression that things have gotten more
| sensical internally over the last year or so.
|
| Note also that the devprod team has largely been shielded
| from the craziness, and may still be making good decisions
| (but I don't know what they are in this realm personally).
| nkohari wrote:
| I spent 4.5 years at Stripe, and left in March.
|
| The biggest difference not mentioned is the article is that
| code is no longer kept on developer machines. The sync process
| described in the article was well-designed, but also was a
| fairly constant source of headaches. (For example, sometimes
| the file watcher would miss an update and the code on your
| remote machine would be broken in strange ways, and you'd have
| to recognize that it was a sync issue instead of an actual
| problem with your code.) As a result, the old devbox system was
| superseded by "remote devboxes", which also host the code.
| Engineers use VSCode remote development via SSH. It works
| shockingly well for a codebase the size of Stripe's.
|
| There are actually several different monorepos at Stripe, which
| is a constant source of frustration. There have been lots of
| efforts to try to unify the codebase into a single git repo,
| but it was difficult for a lot of reasons, not the least of
| which was the "main" monorepo was already testing the limits of
| the solution used for git hosting.
|
| Overall, maintaining good developer productivity is an
| _extremely_ challenging problem. This is especially true for a
| company like Stripe, which is both too large to operate as a
| "small" company and too small to operate as a "big" company.
| Even with a well-funded team of lots of super talented people
| putting forth their best efforts, it's tough to keep all of the
| wheels fully greased.
| jcmfernandes wrote:
| Thanks for this. Can you share the experience of those who
| don't use VS Code?
| tail_exchange wrote:
| IntelliJ is also supported. If you want to use something
| else, like VIM, then you need to ssh into the remote devbox
| machine. They have support for custom dotfiles, so you can
| set up your cool VIM environment for all your remote
| devboxes.
|
| If you don't want remote devboxes, the regular devboxes
| still work. You just need to deal with the additional pain
| for syncing the files.
| cynicalpeace wrote:
| Glad to see that they moved to code living with the execution
| environment. The code living separate from the execution
| environment seemed like too much overhead and complexity for
| not enough benefit.
|
| Especially given VSCode, or Cursor ;), work so well via ssh.
|
| To the engineers that don't want to use those IDE's it might
| suck temporarily, but that's it.
| chaosphere2112 wrote:
| I was only there in 2022, but at that point there were in fact
| three or more monorepos (forked roughly based on toolchain - go
| and scala in one, primarily Ruby in the one detailed here, and
| there was one for the client stripe api libs that was JS only.
| There may have been more.
| bhuga wrote:
| Some important differences from 2019:
|
| * Code is off of laptops and lives entirely on the dev server
| in many (but not all) cases. This has opened up a lot of use
| cases where devs can have multiple branches in flight at once.
|
| * Big investments into bazel.
|
| * Heavier investment into editor experiences. We find most
| developers are not as idiosyncratic in their editor choices as
| is commonly believed, and most want a pre-configured setup
| where jump-to-def and such all "just work".
| cynicalpeace wrote:
| I'm glad to see that first bullet point. The code living
| separate from the execution environment seemed like too much
| overhead and complexity for not enough benefit.
| eikenberry wrote:
| That last point has long been a red flag when interviewing. A
| developer who doesn't care about their tooling also tends to
| not care about the quality of their work.
| ParetoOptimal wrote:
| Its also a red flag for me when a company mandates an IDE.
| mvdtnz wrote:
| I'd rather work with developers who are flexible and open
| minded about the conditions they can work in than those who
| get notoriously pissy if things aren't set up exactly the
| way they like it. Especially when that way is ridiculously
| elaborate and non-standard.
| pjmlp wrote:
| Yet another replay of timesharing development experiences, I
| guess we need a couple of generations more to count how many
| times does a pendulum swing back and forth during a developer's
| lifetime.
| jdtig wrote:
| Does Stripe use RoR?
|
| The author mentions the codebase was Ruby, but I didn't see if
| they talked about Rails.
| bastawhiz wrote:
| It is Ruby but not rails
| jdtig wrote:
| Thanks. I wonder what the experience is like working on a
| very large codebase with or without a framework. E.g. Stripe
| vs Shopify.
|
| Or if the framework is barely noticeable at that scale and
| doesn't really matter anymore. That's the impression I get
| for Instagram (which was built with Django).
| esprehn wrote:
| At that scale there's certainly a framework and many in
| house libraries with opinions and patterns. It's just not
| rails.
| bastawhiz wrote:
| They had their own ORM, and a web framework built on
| Sinatra. It wasn't as though you needed to reach far for a
| tool if you needed one
| jcmfernandes wrote:
| Do they use zeitwerk?
| froydnj wrote:
| We do, yes.
| anonzzzies wrote:
| We use similar practices in our 3.5 person team; we work via
| code-server and Aider with our own tooling on VPSs and this gets
| synced to execution VPSs which run dev versions, a lot of sentry
| logging and tests (mostly playwright these days). There is also a
| vps which does builds all day and logs to Sentry too. We can
| almost instantly get on our own test versions and see what we
| did, and, over the space of some seconds to minutes we see test
| and build data coming in. It works incredibly well for many years
| already. Onboarding people is easy and no one ever has 'it
| doesn't build on my system' as that's not something we do (you
| can of course, all scripts are there but why waste the time?).
|
| I grew up with mainframes, minis and unix batch andor multiuser
| machines; for me this is the best way for business applications.
| I didn't particularly like the move to local all that much.
| aidos wrote:
| Maybe a silly question, but why all this engineering effort when
| you could host the dev environment locally?
|
| By running a Linux VM on your local machine you get a consistent
| environment that you can ssh to, remove the latency issues but
| you remove all the complexity of syncing that they've created.
|
| That's a setup that's worked well for me for 15 years but maybe
| I'm missing some other benefit?
| yeswecatan wrote:
| I came to ask the same thing. We use docker-compose to describe
| all our services which works fine.
| JasonSage wrote:
| This does not scale to a large number of services with a
| certain amount of RAM/processing per service.
| aidos wrote:
| You could still run the proxy they have that lazy boots
| services - that's a nice optimisation.
|
| I don't think that many places are in a position where the
| machines would struggle. They didn't mention that in the
| article as a concern - just that they struggled to keep
| environments consistent (brew install implies some are
| running on osx etc).
| sulam wrote:
| I think it's safe to assume that for something with the
| scale and complexity of Stripe, it would be a tall order
| to run all the necessary services on your laptop, even
| stubs of them. They may not even do that on the dev
| boxes, I'd be a little surprised if they didn't actually
| use prod services in some cases, or a canary at any rate,
| to avoid the hassles of having to maintain on-call for
| what is essentially a test environment.
| aidos wrote:
| I don't know that's safe to assume. Maybe it is an issue
| but it was not one of the issues they talk about in the
| article and not one of the design goals of the system.
| They have the proxy / lazy start system exactly so they
| can limit the services running. That suggests to me that
| they don't end up needing them all the time to get things
| done.
| recroad wrote:
| If you have 100 services in your org, I don't have to have
| 100 running at the same time in your local dev machine. I
| only run the 5 I need for the feature I'm working on.
| Daishiman wrote:
| I've been on this path and as soon as you work on a
| couple of concurrent branches you end up having 20
| containers in your machine and setting these up to run
| successfully ends up being its own special PITA.
| layer8 wrote:
| What exactly are the problems created by having a larger
| number of containers? Since you're mentioning branches,
| these presumably don't have to all run concurrently, i.e,
| you're not talking about resource limitations.
| Daishiman wrote:
| Large features can require changing protocols or altering
| schemas in multiple services. Different workflows can
| require different services, etc. Keep track of different
| service versions in a couple branchs (not unusual IMO)
| and it just becomes messy.
| layer8 wrote:
| What does this have to do with running locally vs. on a
| dev server? You have to properly manage versions in any
| case.
| adamdecaf wrote:
| We have 100 Go services (with redpanda) and a few
| databases in docker-compose on dev laptops. It works well
| when and we buy the biggest memory MacBooks available.
|
| https://moov.io/blog/education/moovs-approach-to-setup-
| and-t...
| JasonSage wrote:
| Your success with this strategy correlates more strongly
| with 'Go' than '100 services' so it's more anecdotal than
| generally-acceptable that you can run 100 services
| locally without issues. Of course you can.
|
| Buying the biggest MacBook available as a baseline
| criteria for being able to run a stack locally with
| Docker Compose does not exactly inspire confidence.
|
| At my last company we switched our dev environment from
| Docker Compose to Nix on those same MacBooks and CPU
| usage when from 300% to <10% overnight.
| ikety wrote:
| Have any details on how you've implemented Nix? For my
| personal projects I use nix without docker and the
| results are great. However I was always fearful that nix
| alone wouldn't quite scale as well as nix + docker for
| complicated environments.
|
| I've used the FROM SCRATCH strat with nix:
|
| https://mitchellh.com/writing/nix-with-dockerfiles
|
| Is that how you implemented it?
| adamdecaf wrote:
| Buying the biggest Mac's also lets developers run an
| electron app or three (Slack, IDE, Spotify, browser, etc)
| while running the docker-compose stack.
| JasonSage wrote:
| You're right. My coworkers remarked that they could run
| Slack and do screensharing while running the apps locally
| when we removed docker-compose.
| stealthybox wrote:
| That's a huge win -- has your team written about or spoke
| on this anywhere?
| JasonSage wrote:
| No but I'd be happy to (I maintained the docker-compose
| stack, our CLI, and did the transition to Nix).
| adamdecaf wrote:
| I'd like to learn more about switching compose to nix. We
| will hit a wall with compose at some point.
| stealthybox wrote:
| I'm curious about the # of svc's / stack / company / team
| size -- if you have your own blog -- would love to read
| it when you publish
|
| could be a cool lightning talk (or part of something
| longer)
|
| maybe it's a good piece for https://nixinthewild.com/ ?
|
| I'm @capileigh on twitter and hachyderm.io if you wanna
| reach out separately -- here is good tho too
| n0us wrote:
| You're limited by the resources available to you on your local
| laptop and when you close that laptop the dev environment stops
| running. Remote dev environments are more costly and
| complicated to maintain but they can be shared, can scale
| vertically (or horizontally) on demand, can persist when you
| exit them, and managing access to various internal services
| from dev environments can in some cases be simpler.
|
| It also centralizes dev environment management to the platform
| team that owns them and provides them as a service which cuts
| down on support tickets related to broken dev environments.
| There are certainly some trade offs though and for most
| companies a local VM or docker compose file will be a better
| choice.
| giido wrote:
| Also tends to security advantages to mitigate/manage dev
| risks. Typically hosts will have security tooling installed
| (AV, EDR, etc) that may not be installed on local VMs, hosts
| are ephemeral so quickly created and destroyed, network
| restrictions, etc.
| underdeserver wrote:
| Most local laptops are much stronger than is needed to run
| the entire stack of your average startup with no resource
| issues.
|
| And the dev environment stops running when you close the
| laptop, but you also don't need it since you're not
| developing.
|
| Not saying it can work for absolutely all cases but it's
| definitely good enough for a lot of cases.
| anthonypasq wrote:
| ... this is an article about Stripe, not your average
| startup
| underdeserver wrote:
| And yet the discussion can go beyond that.
| crabbone wrote:
| Not even once did I want to share my dev. environment, nor
| did anyone want to share mine. We are talking about 25-odd
| years of being a developer.
|
| Never in my life did I want to scale my dev. environment
| vertically or horizontally or in any other direction. Unless
| you work on a calculator, I don't know why would you need
| that.
|
| I have no problems with my environment stopping when I close
| my laptop. Why is this a problem for anyone?
|
| For overwhelming majority of programming projects out there
| they fit on a programmer's laptop just fine. The rare
| exceptions are the projects which require very specialized
| equipment not available to the developers. In any case, a
| simulator would be usually a preferable way to dealing with
| this, and the actual equipment would be only accessed for
| testing, not for development. Definitely not as a routine
| development process.
|
| Never in my life did I want development process to be
| centralized. All developers have different habits, tastes and
| preferences. Last thing I want is to have centralized
| management of all environments which would create unwanted
| uniformity. I've been only once in a company that tried to
| institute a centrally-managed development environment in the
| way you describe, and I just couldn't cope with it. I quit
| after few month of misery. The most upsetting aspect about
| these efforts is stupidity. These efforts solve no problems,
| but add a lot of pain that is felt continuously, all the time
| you have to do anything work-related.
| otabdeveloper4 wrote:
| > For overwhelming majority of programming projects out
| there they fit on a programmer's laptop just fine.
|
| What? No. You live in a very sheltered world, my friend.
| marcosdumay wrote:
| I get a serious feeling that interpreted languages,
| monorepos, environment orchestration, snapshot ecosystem
| aggregators, and per-function execution evironments are all
| pushing software development into the wrong direction.
|
| Those things are not bad by themselves. But people tend to
| do bad things with them, and those bad things spread
| remarkably well, disrupting every place they infect.
| bhuga wrote:
| I work on this at Stripe. There's a lot of reasons:
|
| * Local dev has laptop-based state that is hard to keep in sync
| for everyone. Broken laptops are _really hard_ to debug as
| opposed to cloud servers I can deploy dev management software
| to. I can safely say the oldest version of software that's in
| my cloud; the laptops skew across literally years of versions
| of dev tools despite a talented corpeng team managing them.
|
| * Our cloud servers have a lot more horsepower than a laptop,
| which is important if a dev's current task involves multiple
| services.
|
| * With a server, I can get detailed telemetry out of how devs
| work and what they actually wait on that help me understand
| what to work on next; I have to have pretty invasive spyware on
| laptops to do the same.
|
| * Servers in our QA environment can interact with QA services
| in a way that is hard for a laptop to do. Some of these are
| "real services", others are incredibly important to dev itself,
| such as bazel caches.
|
| There's other things; this is an abbreviated list.
|
| If a linux VM works for you, keep working! But we have not been
| able to scale a thousands-of-devs experience on laptops.
| aidos wrote:
| I want to double check we're talking about the same thing
| here. I'm referring to running everything inside a single VM
| that you would have total access to. It could have telemetry,
| you'd know versions etc. I wonder if there's some confusion
| around what I'm suggesting given your points above.
|
| I'm sure there are a bunch of things that make it the right
| choice for Stripe. Obviously if you just have too many things
| to run at a time and a dev laptop can't handle it then it's a
| dealbreaker. What's the size of the cloud instances you have
| to run on?
| drited wrote:
| I see in another comment thread you mentioned downloading
| the VM iso, presumably from a central source. Your comment
| in this thread didn't mention that so perhaps this answer
| (incorrectly) assumes the VM you are talking about was
| locally maintained/created?
| bhuga wrote:
| > I'm referring to running everything inside a single VM
| that you would have total access to. It could have
| telemetry, you'd know versions etc. I wonder if there's
| some confusion around what I'm suggesting given your points
| above.
|
| I don't think there's confusion. I only have total access
| when the VM is provisioned, but I need to update the dev
| machine constantly.
|
| Part of what makes a VM work well is that you can make
| changes and they're sticky. Folks will edit stuff in /etc,
| add dotfiles, add little cron jobs, build weird little SSH
| tunnels, whatever. You say "I can know versions", but with
| a VM, I can't! Devs will run update stuff locally.
|
| As the person who "deploys" the VM, I'm left in a weird
| spot after you've made those changes. If I want to update
| everyone's VM, I blow away your changes (and potentially
| even the branches you're working on!). I can't update
| anything on it without destroying it.
|
| In constrast, the dev servers update constantly. There's a
| dozen moving parts on them and most of them deploy several
| times a day without downtime. There's a maximum host
| lifetime and well-documented hooks for how to customize a
| server when it's created, so it's clear how devs need to
| work with them for their customizations and what the
| expectations are.
|
| I guess its possible you could have a policy about when the
| dev VM is reset and get developers used to it? But I think
| that would be taking away a lot of the good parts of a VM
| when looking at the tradeoffs.
|
| > What's the size of the cloud instances you have to run
| on?
|
| We have a range of options devs can choose, but I don't
| think any of them are smaller than a high-end laptop.
| aidos wrote:
| So the devs don't have the ability to ssh to your cloud
| instances and change config? Other than the size issue,
| I'm still not seeing the difference. Take your point on
| it needing to start before you have control, but other
| than that a VM on a dev machine is functionally the same
| as one in a cloud environment.
|
| In terms of needing to reset, it's just a matter of git
| branch, push, reset, merge. In your world that sync
| complexity happens all the time, in mine just on reset.
|
| Just to be clear, I think it's interesting to have a
| healthy discussion about this to see where the tradeoffs
| are. Feels like the sort of thing where people try to
| emulate you and buy themselves a bunch of complexity
| where other options are reasonable.
|
| I have no doubt Stripe does what makes sense for Stripe.
| I'd also wager than on balance it's not the best option
| for most other teams.
|
| PS thanks for chiming in. I appreciate the extra insights
| and context.
| bhuga wrote:
| > So the devs don't have the ability to ssh to your cloud
| instances and change config?
|
| They do, but I can see those changes if I'm helping
| debug, and more importantly, we can set up the most
| important parts of the dev processes as services that we
| can update. We can't ssh into a VM on your laptop to do
| that.
|
| For example, if you start a service on a stripe machine,
| you're sending an RPC to a dev-runner program that
| allocates as many ports as are necessary, updates a local
| envoy to make it routable, sets up a systemd unit to keep
| it running, and so forth. If I need to update that
| component, I just deploy it like anything else. If
| someone configures their host until that dev runner
| breaks, it fails a healthcheck and that's obvious to me
| in a support role.
|
| > Just to be clear, I think it's interesting to have a
| healthy discussion about this to see where the tradeoffs
| are. Feels like the sort of thing where people try to
| emulate you and buy themselves a bunch of complexity
| where other options are reasonable.
|
| 100% Agree! I think we've got something pretty cool, but
| this stuff is coming from a well-resourced team; keeping
| the infra for it all running is larger than many
| startups. There's tradeoffs involved: cost, user support,
| flexibility on the dev side (i.e. it's harder to add
| something to our servers than to test out a new kind of
| database on your local VM) come immediately to mind, but
| there are others.
|
| There are startups doing lighter-weight, legacy-free
| versions of what we're doing that are worth exploring for
| organizations of any size. But remote dev isn't the right
| call for every company!
| aidos wrote:
| Ah! So that's a spot where we're talking past each other.
|
| I'd anticipate you would be equally as able to ssh to VMs
| on dev laptops. That's definitely a prerequisite for
| making this work in the same way as you're currently
| doing.
|
| The only difference between what you do and what I'm
| suggesting is the _location_ of the VM. That itself
| creates some tradeoffs but I would expect absolutely
| everything inside the machine to be the same.
| bhuga wrote:
| > I'd anticipate you would be equally as able to ssh to
| VMs on dev laptops. That's definitely a prerequisite for
| making this work in the same way as you're currently
| doing.
|
| Our laptops don't receive connections, but even if they
| could, folks go on leave and turn them off for 9 months
| at a time, or they don't get updated for whatever reason,
| or other nutty stuff.
|
| It's surprisingly common with a few thousand of them out
| there that laptop management code that removes old
| versions of a tool is itself removed after months, but
| laptops still pop up with the old version as folks turn
| them back on after a very long time, and the old tool
| lingers. The services the tools interact with have long
| since stopped working with the old version, and the
| laptop behaves in unpredictable ways.
|
| This doesn't just apply to hypothetical VMs, but various
| CLI tools that we deploy to laptops, and we still have
| trouble there. The VMs are just one example, but a
| guiding principle for us been that the less that's on the
| laptop, the more control we have, and thus the better we
| can support users with issues.
| hibikir wrote:
| To provide historical context, 10 years ago there was a local
| dev infrastructure, but it was already so creaky as to be
| unreliable. Just getting the ruby dependencies updated was a
| problem. The local dev was also already cheating: All the
| asynchronous work that was triggered via RabbitMQ/Kafka was
| getting hacked together, because trying to run everything
| that Infra/Queues did locally would have been very wasteful.
| So magic occurred in the calls to the message queue that
| instead triggered the crucial ruby code that would be hit in
| the end.
|
| So if this was a problem back then, when the company had less
| than 1000 employees, I can't even imagine how hard would it
| be to get local dev working now
| underdeserver wrote:
| The way these problems are stated mighy make it seem like
| they're unsolvable without a lot of effort. I just want to
| point out that I've worked at places that do use a local,
| supported environment, and it works well.
|
| Not saying it's the wrong choice for you, but it's a choice,
| not a natural conclusion.
| crabbone wrote:
| Working in a configuration where your development environment
| isn't on your computer is always a huge downgrade. Work with
| VM? -- sooner or later you'll have problems with forwarding
| your keyboard input to the VM. Work with containers? -- no good
| way to save state, no good way to guarantee all containers are
| in sync etc. God forbid any sort of Web browser-based solution.
| The number of times I accidentally closed the tab or did
| something else unintentionally because of key mapping that's
| impossible to modify...
|
| However, in some situations you must endure the pain of doing
| this. For example, regulatory reasons. Some organizations will
| not allow you to access their data anywhere but on some cloud
| VM they give you very botched and very limited control over.
| While, technically, these are usually easy to side-step, you
| are legally _required_ to not move the data outside of the
| boundaries defined for you by the IT. And so you are stuck in
| this miserable situation, trying to engineer some semblance of
| a decent utility set in a hostile environment.
|
| Another example is when the infrastructure of your project is
| too vast to be meaningfully reduced to your laptop, and a lot
| of your work is exploratory in nature. I.e. instead of typical
| write-compile-upload-test you are mostly modifying stuff on the
| system you are working on to see how it responds. This is kind
| of how my day-to-day goes: someone reported they fail to
| install or use one of the utilities we provide in a particular
| AWS region with some specific network settings etc. They'd give
| me a tunnel to the affected cluster, and I'd have some hours to
| spend there investigating the problem and looking for possible
| immediate and long-term solutions. So, you are essentially
| working in a tech-support role, but you also have to write
| code, debug it, sometimes compile it etc.
| aidos wrote:
| Sounds like you're talking about something else (more like
| the Citrix / virtual desktop type model - I don't know the
| name).
|
| The idea here is that you use a VM (cloud or local) to run
| your compute. Most people can run it in the background
| without explicitly connecting to it.
| simonw wrote:
| In my opinion the single most important feature of any
| development environment is a reliable "reset" button.
|
| The amount of time companies lose to broken development
| environments is incredible. A developer can easily lose half a
| day (or more) of productive time.
|
| With cloud environments it's much easier to offer a "just give
| me a brand new environment that works" button somewhere. That's
| incredibly valuable.
| aidos wrote:
| For sure, _but_ , a VM has that feature too. They have to run
| _some_ services directly on the laptop to handle the code
| syncing. So if you accept a certain amount of "need to do
| some dev machine setup" as a cost, installing Parallels and
| running a script to download an iso is a pretty small surface
| area that allows for a full reset.
|
| I don't doubt that Stripe have a setup that works well for
| them them but I also bet they could have gone done a
| different path that also worked well _and_ I suspect that
| other path (local VMs) is a better fit for most other smaller
| teams.
| sam_perez wrote:
| To be fair, it seems like the cloud development environment
| choice was driven by the scale of Stripe's organization.
| dheera wrote:
| > By running a Linux VM
|
| Or just run Linux on your local machine as the OS. I don't get
| the obsession with Macs as dev workstations for companies whose
| products run on Linux.
| philwelch wrote:
| Especially when they don't even deploy to ARM servers.
| uncanneyvalley wrote:
| The year of Linux on the laptop has yet to arrive for most of
| us. Windows and MacOS both offer better battery life, if for
| no other reason (and there are usually other reasons, like
| suspend/wake issues, graphics driver woes, etc.)
| trevor-e wrote:
| From what I remember (left Stripe in late 2022) much of
| Stripe's codebase was/is a Ruby tangled "big ball of mud"
| monorepo due to lack of proper modules. Basically a lot of the
| core modules all imported code from each other with little
| layering so you couldn't deploy a lean service without pulling
| in almost all of the monorepo code. And due to the way imports
| worked it would load a ton of this code a runtime. This meant
| that even a simple service would have extremely high memory
| usage and be unsuitable for a local dev environment where you
| have N of these bloated services running at the same time.
| There was a big refactoring effort to get "strict modules" in
| place to cut down on this bloat which had some promising
| results. I'm not an expert in this area but I believe this was
| the gist of it.
| mleo wrote:
| I use syncthing to manage the synchronization of files between
| local laptop and remote development server. The software code
| base is upwards of 20 years and has dependencies on Windows for
| runtime. I can run unit tests locally on very fast MacBook Pro or
| run it much slower on Windows VM. With syncthing I can easily
| edit files locally or remotely and they are available locally for
| source control.
|
| The worst problem is refining the ignore settings to ensure only
| code is synced preventing conflicts on derivative files and that
| some rule doesn't overlap code file names.
| nxicvyvy wrote:
| Try unison, it's built for this use case.
|
| https://www.cis.upenn.edu/~bcpierce/unison/
| shepherdjerred wrote:
| I like Unison, though I found Mutagen a bit better.
|
| https://mutagen.io/
| mootoday wrote:
| I've worked with remote dev environments for many years,
| including some time with one of the providers of such a service.
|
| It became clear to me that cloud-only is not the way to go, but
| instead a local-first, cloud-optional approach.
|
| https://mootoday.com/blog/dev-environments-in-the-cloud-are-...
| numbsafari wrote:
| This is my biggest complaint with GitHub CodeSpaces.
|
| I should be able to launch a local VM using the GitHub Desktop
| App just as easily as I can an Azure-hosted instance.
| ParetoOptimal wrote:
| But then how will they lock you into paying them monthly?
| numbsafari wrote:
| They're just forcing me to stop paying them altogether.
| truetraveller wrote:
| "I've described a lot of fairly-involved custom tooling; we
| needed enough engineers to build and maintain it, and enough
| "customer" engineers for that investment to pay off."
|
| This is so important when deciding to re-invent the wheel. I've
| gotten bitten by this many times.
| p-o wrote:
| It's always so enlightening to have articles like this one shed
| light on how companies at scale operate. It goes without saying
| that many of the problems Stripe faced with their monorepo isn't
| application to smaller businesses, but there are still bits and
| pieces that are applicable to many of us.
|
| I've been working on an ephemeral/preview environment operator
| for Kubernetes(https://github.com/pier-oliviert/sequencer) and as
| I could agree to a lot of things OP said.
|
| I think dev boxes is really the way to go, specially with all the
| components that makes an application nowadays. But the
| latency/synchronization issue is a hard topic and it's full of
| tradeoff.
|
| A developer's laptop always ends up being a bespoke environment
| (yes, Nix/Docker can help with that), and so, there's always a
| confidence boost when you get your changes up on a standalone
| environment. It gives you the proof that "hey things are working
| like I expected them to".
| draw_down wrote:
| Right, dev boxes do not need to do double duty as a personal
| computer plus development target, which allows them to more
| closely resemble the machine your code will actually run on.
| They also can be replaced easily, which can be helpful if you
| ever suspect something is wrong with the box itself - if the
| new one acts the same way, it wasn't the dev box.
|
| I don't recall latency being a big problem in practice. In an
| organization like this, it's best to keep branches up to date
| with respect to master anyway, so the diffs from switching
| between branches should be small. There was a lot of work done
| to make all this quite performant and nice to use. The slowest
| part was always CI.
| tmpz22 wrote:
| I feel like we're not getting the right lessons from this. It
| feels like we're focusing on HOW we can do something versus
| pausing for a brief moment to consider if we SHOULD in the
| first place.
|
| To me the root issue is the complexity of production
| environments has expanded to the point of impacting
| complexity in developer environments just to deploy or test -
| this is in conjunction with expanding complexity of developer
| environments just to develop - i.e. web pack.
|
| For very large well resourced organizations like Stripe that
| actually operate at scale that complexity may very well be
| unavoidable. But most organizations are not Stripe. They
| should consider decreasing complexity instead of investing in
| complex tooling to wrangle it.
|
| I'd go as far as to suggest both monorepos and dev-boxes are
| complex toolchains that many organizations should consider
| avoiding.
| epinephrinios wrote:
| Absolutely, I worked on tech behemoths and smaller
| companies. The dev experience was significantly better when
| all development was local. I even worked on initiatives to
| move development _away_ from the cloud, and although other
| devs were skeptical, they ended up loving it.
| Eridrus wrote:
| I think we don't have good solutions for scaling down prod.
|
| Our relatively simple prod architecture has 5 containers &
| a hosted database (so 6 containers when run locally), and
| any less would impact our product goals.
|
| I still find running prod locally valuable, and is the most
| common way anyone does development here, but containers are
| fairly heavyweight when you want to run everything on one
| machine. It's also impossible if you have parts that need
| special accelerators to get good latency, etc.
|
| If you're willing to build everything from scratch, you can
| have a framework that seamlessly lets you build conceptual
| services and then separate the physical deployment
| concerns, like Google has and sometimes even uses. But for
| the rest of us where we're clobbering together a bunch of
| different technologies, that's a luxury we can't really
| afford.
| jrochkind1 wrote:
| > I'd go as far as to suggest both monorepos and dev-boxes
| are complex toolchains that many organizations should
| consider avoiding.
|
| I'm not sure "monorepo" means the same thing to you as it
| does to me? To me, it just means "keep all the code in one
| repo, instead of trying to split things up into different
| repos."
|
| To me, it's the thing that _is_ the simple solution, it
| just means "a repo" -- the reason it gets a name is
| because it's _unusual_ for large orgs with enormous
| codebases to have everything in one repo, it 's unusual for
| them to do the simple thing that works fine for a small org
| with a normal codebase.
|
| What is it you're suggesting a simple organization should
| do instead of a "monorepo"?
| ahtihn wrote:
| How is a mono repo the simple solution compared to one
| repo per independently releasable component ?
|
| All the tooling is much easier to use when each
| application has its own repo.
| jrochkind1 wrote:
| The argument would be that for simple organization,
| dividing things into independently releasable components
| is less simple than just having one app. I think that's
| what most simple organizations do, no? Why do you need
| the complexity of independently releasable components for
| your simple organization? Now you have to track
| compatibility between things, ensure what version of what
| independently releasable thing works with what version of
| what independently releasable other thing, isn't that
| added complexity? Why not just have one application,
| isn't that simpler? You don't need to worry about
| incompatibilities between your separately releasable
| things -- every commit that passes CI on your single repo
| means all the parts are compatible (sans untested bugs).
|
| Usually it stops being "simpler" at a level of
| organizational complexity or code size where it becomes a
| mess. The "monorepo" is the attempt to do what everyone
| was just doing anyway for simple orgs with simple
| codebases, but keep doing it at huge sizes.
| zten wrote:
| If you're living in the same dysfunctional world I am,
| then maybe your organization split things into repos that
| are separately releasable, but are conceptually so
| strongly coupled that you now need to create changes on 3
| repos to make a change.
| tmpz22 wrote:
| > To me, it just means "keep all the code in one repo,
| instead of trying to split things up into different
| repos."
|
| To me, and perhaps more from a Devops-like perspective,
| mono repo means "one repo many _diverse_ deployment
| environments and artifacts often across multiple
| programming languages ".
|
| Im advocating against the Google/Stripe situation of a
| singular massive repo with complex build tools to make it
| function - like Bazel. I think sometimes _small_
| organizations get lured by ego and bad cost /benefit
| analysis into implementing such an architecture and it
| can tank entire product orgs in my experience (obviously
| not for Stripe, Google, etc.).
| hamandcheese wrote:
| My main gripe with the dev box approach is that a cloud
| instance with similar compute resources as a developers MacBook
| is hella expensive. Even ignoring compute, a 1TB ebs volume
| with equivalent performance to a MacBook will probably cost
| more than the MacBook every month.
| axus wrote:
| The article didn't actually say what "Stripe's cloud
| environment" was, besides "outside of the production
| environment". I assumed the company had their own hardware
| but your assumption is more probable.
| MetaWhirledPeas wrote:
| Wouldn't this be a reasonable alternative? Asking because I
| don't have experience with this.
|
| 1. New shared builds update container images for applications
| that comprise the environment
|
| 2. Rather than a "devbox", devs use something like Docker
| Compose to utilize the images locally. Presumably this would
| be configured identically to the proposed devbox, except with
| something like a volume pointing to local code.
|
| I'm interested in learning more about this. It seems like a
| way to get things done locally without involving too many
| cloud services. Is this how most people do it?
| DiggyJohnson wrote:
| I manage a dev environment for a small, inexperienced (but
| eager) team and I have a similar setup. I'll do a write up
| at some point if I have time. It can work, and does for me,
| but there are some funny consequences can end up mediate
| the relationship between a developer's computer and his
| code, which is a terrible place to be.
| secondcoming wrote:
| What's the easiest way of sharing things like protobuf
| definitions across multiple separate repos and making sure things
| are always in sync?
| MrDarcy wrote:
| buf.build
| crabbone wrote:
| NB. What the article describes isn't a developer environment in
| the cloud. It's _testing_ in the cloud. The editor in their model
| lives on the programmers ' laptops, the editing happens there as
| well and so on. The code is deployed to cloud infrastructure for
| testing.
| physicsguy wrote:
| I think for smaller companies, you can get a long way towards a
| lot of this with judicious use of docker-compose, and convenience
| scripts in a Makefile. As long as you don't do anything stupid
| like try and spin up 100 services when you're a team of 8, most
| laptops these days are sufficiently capable of handling a
| database, Redis, your codebase, and something like LocalStack.
| PedroBatista wrote:
| I would say you can even go a looong way without any Docker at
| all.
|
| And for the large majority of the companies/projects, if your
| project is so complex and heavy of resources that it doesn't
| fit on a modern laptop, the problem is not in the laptop, it's
| in the whole project and the culture and cargo-cult around
| "modern" software development.
| vlovich123 wrote:
| Containers/VMs are a nice way to isolate away any machine
| configuration discrepancies. Conversely it does encourage the
| use of non hermetic and deterministic build systems which
| come with other issues too (eg speed differences surfacing
| race conditions in the build)
| elktown wrote:
| - "A single-binary app behind a load-balancer might scale to
| far beyond our needs, but the promotion/resume trade-off
| can't be justified."
| adamdecaf wrote:
| We've been using a hundred repositories and a hundred Go services
| in a local docker-compose setup that's worked fairly well. CI
| runners can struggle if their disks can't keep up with Docker.
|
| It comes up that we should make a devprod for front end folks to
| make the backend abstracted more.
|
| Overall a lot of people prefer local dev because it gives them
| access to the entire stack, lets them run branch images easier,
| and has better performance than remote boxes.
|
| https://moov.io/blog/education/moovs-approach-to-setup-and-t...
| prasoonds wrote:
| I wonder if there's a devbox-as-a-service tool out there. I use a
| MacBook Air for most of my work and on occasion would be
| benefited by using a beefier machine in the cloud. I just don't
| want to set up a machine, set up sync etc.
| metachris wrote:
| You could just rent a beefy server for like $40/month at
| hetzner or OVH and use VS Code with the remote development
| extension.
| stealthybox wrote:
| This is an awesome writeup of the tools and culture issues you
| run into maintaining dev environments.
|
| From post, the problems that justified central dev boxes are
| roughly: 1. dependency / config mgmt / env drift on laptops 2.
| collaboration / debugging between engineers 3. compute scaling +
| optimization 4. supporting devs with updates and infra changes
|
| The last one is particularly interesting to me, because
| supporting the dev env is separate engineering role/task that
| starts small and grows into teams of engineers supporting the
| environment.
|
| I'm helping build Flox. We're working on these pain points by
| making environments (deps, vars, services, and builds) workable
| across all kinds of Mac/Linux laptops and servers. 1) a.
| Virtualize the pkg manager per-project b. Nix packages can
| install across OS/arch pretty well 2) Imperative actions like
| `flox install`/`upgrade` always edit a declarative env
| manifest.toml -- share it via git 3) less Docker VM's -- get more
| out of devteam Macbooks 4) reduce toil with a versioned,
| shareable envs --> less sending ad-hoc config and brew commands
| to people (as mentioned in the post.) Just `git pull && flox
| activate`.
|
| I think on problem point #2, collab tools are advancing to where,
| pairing on features, bugs, and env issues can be done without
| central SSH. (ex: tmate, vscode liveshare, screensharing, etc) --
| however, that does sort of fall apart on laptops for async
| debugging of env issues (ex: when devprod is in the US, and eng
| is in London). Having universal telemetry on ephemeral cloud dev-
| boxes with a registry and all of the other DNS and SSH goodies
| could be the kind of infra to aspire to as your small teams run
| into more big-team problems.
|
| In the Stripe anecdote, adopting the centralized infra created
| new challenges that their devprod teams were dedicated to
| supporting: - international latency from central, US-based VM's -
| syncing code to the dev boxes
| (https://facebook.github.io/watchman/) - linting, formatting,
| generating configs (run it locally or serverside?) - a dev
| workflow CLI tool dedicated to dev-box workflows and sync'ing
| with watchman's clock - IaaS, registry, config, glue for all the
| servers
|
| This is all very non-trivial work, but maybe there's a future
| where people can win some portability with Flox when they are
| small and grow into those new challenges when it's truly needed
| -- now their laptop environments just get a quick `flox activate`
| on some new, shiny servers or Cloud IDE's.
|
| I really like the notes from the author on how useing Language
| Server Protocol across a high latency link has great
| optimizations that work along side the watchman sync for real-
| time code editing.
| vfclists wrote:
| How does a payment service wind up with over a 1000 engineers?
|
| I understand that "engineers" may not mean "developers", it could
| DevOps, site reliability and all the bits and pieces that make up
| a large service provider, but over a 1000?
|
| Can someone please enlighten me?
| itsjustjordan wrote:
| Surely in 2024 we can't be classifying Stripe as just "a
| payment service"
| ronef wrote:
| I love this. I believe I might have even interfaced with your
| team around that time. I was leading Facebook's (now Meta)
| Developer Products team and we were building against super
| similar areas internally.
|
| We ran back then a similar project that I coined "Developer On-
| Demand" to tackle that same problem space. It's also what
| eventually lead me to find the magics of Nix and then build Flox.
|
| I also agree with a lot of what was shared in other comments,
| while the problems we tackled at large orgs such as Facebook,
| Shopify, Uber, Google (to name a few teams I remember working
| with) and obviously also Stripe, certain areas of the pain are
| 100% universal regardless of team size.
|
| On the Flox side, we're trying to help with a few of them today
| and many more hopefully in the soon future, very open for
| thoughts! Things like - simple to use Nix for each of your
| projects + keep deps and config up to date across everyones
| Macbooks and Linux boxes, etc -- even if you don't have a full
| AWS team and Language Server team ready to support.
___________________________________________________________________
(page generated 2024-08-19 23:01 UTC)