[HN Gopher] Slow deployment causes meetings (2015)
       ___________________________________________________________________
        
       Slow deployment causes meetings (2015)
        
       Author : fagnerbrack
       Score  : 174 points
       Date   : 2024-12-22 03:12 UTC (19 hours ago)
        
 (HTM) web link (tidyfirst.substack.com)
 (TXT) w3m dump (tidyfirst.substack.com)
        
       | dang wrote:
       | Related:
       | 
       |  _Slow Deployment Causes Meetings_ -
       | https://news.ycombinator.com/item?id=10622834 - Nov 2015 (26
       | comments)
        
       | yarg wrote:
       | I had a boss who actually acknowledged that he was deliberately
       | holding up my development process - this was a man who refused to
       | allow me a four day working week.
        
       | Sparkyte wrote:
       | Sounds like a process problem. 2024 development cycles should be
       | able to handle multiple lanes of development and deployments.
       | Also why things moved to microservices so you can deploy with
       | minimal impact as long as you don't tightly couple your
       | dependencies.
        
         | m00x wrote:
         | You don't need microservices to do this. It's actually easier
         | deploying a monolith with internal dependencies than deploying
         | microservices that depend on each other.
        
           | adrianpike wrote:
           | This is very accurate - microservices can be great as a
           | forcing function to revisit your architectural boundaries,
           | but if all you do is add a network hop and multiple
           | components to update when you tweak a data model, all you'll
           | get is headcount sprawl and deadlock to the moon.
           | 
           | I'm a huge fan of migrating to microservices as a secondary
           | outcome of revisiting your component boundaries, but just
           | moving to separate repos & artifacts so we can all deploy
           | independently is a recipe for pain.
        
             | jrs235 wrote:
             | and a recipe for "career" driven managers and directors to
             | grow department head count, budget oversight, and self
             | importance.
        
             | Sparkyte wrote:
             | Network hop isn't needed if you're deploying your
             | microservices correctly. So you can make pod groups inside
             | of kubernetes and application that depends on another can
             | call that lightweight container contained in that pod
             | group. Pods inherently know the other is there in their
             | group it has some or like network call without traversing
             | hardware.
        
           | Sparkyte wrote:
           | I know microservices and monoliths are a heated topic.
           | However breaking up complicated code to preserve user
           | experience is sometimes essential. However you can have
           | machines that contain many services and that interact with
           | each for performance if needed. You would put them into pod
           | groups while deploying to kubernetes and have them call their
           | service inside of the pod. This can increase performance and
           | through put.
        
       | lizzas wrote:
       | Microservices lets you horizontally scale deployment frequency
       | too.
        
         | theptip wrote:
         | Not a silver bullet; you increase api versioning overhead
         | between services for example.
        
           | whateveracct wrote:
           | True but your API won't be changing that rapidly especially
           | in a backwards-incompatible way.
        
             | dhfuuvyvtt wrote:
             | What's that got to do with microservices?
             | 
             | Edit, because you can avoid those things in a monolith.
        
           | motorest wrote:
           | > Not a silver bullet; you increase api versioning overhead
           | between services for example.
           | 
           | That's actually a good thing. That ensures clients remain
           | backwards compatible in case of a rollback. The only people
           | who don't notice the need for API versionin are those who are
           | oblivious to the outages they create.
        
         | fulafel wrote:
         | I think this was the meme before moduliths[1][2] where people
         | conflated the operational and code change aspects of
         | microservices. But it's just additional incidental complexity
         | that you should resist.
         | 
         | IOW you can do as many deploys without microservices if you
         | organize your monolithic app as independent modules, while
         | keeping out the main disadvantages of the microservice
         | (infra/cicd/etc complexity, and turning your app's function
         | calls into a unreliable distributed system communication
         | problem).
         | 
         | [1] https://www.fearofoblivion.com/build-a-modular-monolith-
         | firs...
         | 
         | [2] https://ardalis.com/introducing-modular-monoliths-
         | goldilocks...
        
           | trog wrote:
           | An old monolithic PHP application I worked on for over a
           | decade wasn't set up with independent modules and the average
           | deploy probably took a couple seconds, because it was an svn
           | up which only updated changed files.
           | 
           | I frequently think about this when I watch my current
           | workplace's node application go through a huge build process,
           | spitting out a 70mb artifact which is then copied multiple
           | times around the entire universe as a whole chonk before
           | finally ending up where it needs to be several tens of
           | minutes later.
        
             | fulafel wrote:
             | Yeah, if something even simpler works, that's of course
             | even better.
             | 
             | I'd argue the difference between that PHP app and the Node
             | app wasn't the lack of modularity, you could have a
             | modulith with the same fast deploy.
             | 
             | (But of course modulith is too just extra complexity if you
             | don't need it)
        
             | withinboredom wrote:
             | Even watching how php applications get deployed these days,
             | where it goes through this huge thing and takes about the
             | same amount of time to replace all the docker containers.
        
               | trog wrote:
               | I avoid Docker for precisely that reason! I have one
               | system running on Docker across our whole org - Stirling-
               | PDF providing some basic PDF services for internal use.
               | Each time I update it I have to watch it download 700mb
               | of Docker stuff, instead of just doing an in-place
               | upgrade of a few files.
               | 
               | I get that there are advantages in shipping stuff like
               | this. But having seen PHP stuff work for decades with in-
               | place deploys and no build process I am just continually
               | disappointed with how much worse the experience has
               | become.
        
           | motorest wrote:
           | > I think this was the meme before moduliths[1][2] where
           | people conflated the operational and code change aspects of
           | microservices.
           | 
           | People conflate the operational and code change aspects of
           | microservices just like people conflate that the sky is blue
           | and water is wet. It's a statement of fact that doesn't go
           | away with buzzwords.
           | 
           | > IOW you can do as many deploys without microservices if you
           | organize your monolithic app as independent modules, while
           | keeping out the main disadvantages of the microservice
           | (infra/cicd/etc complexity, and turning your app's function
           | calls into a unreliable distributed system communication
           | problem).
           | 
           | This personal opinion is deep within "not even false"
           | territory. You can also deploy as many times as you'd like
           | with any monolith, regardless of what buzzwords you tack on
           | that.
           | 
           | What you're completely missing from your remark is the
           | loosely coupled nature of running things on a separate
           | service, how trivial it is to do blue-green deployments, and
           | how you can do gradual rollouts that you absolutely cannot do
           | with a patch to a monolith, no matter what buzzwords you tack
           | on it. That is the whole point of mentioning microservices:
           | you can do all that without a single meeting.
        
             | fulafel wrote:
             | I seem to have struck a nerve!
             | 
             | While there may be some things that can come for free with
             | microservices (and not moduliths), your mentioned ones
             | don't sound convincing. Blue-green deployments and gradual
             | rollouts can be done with modulith and can't think of any
             | reason that would be harder than with microservices (part
             | of your running instances can run with a different version
             | of module X). The coupling can be just as loose as with
             | microservices.
        
             | jmulho wrote:
             | Blue-green deployments is a buzzword no matter what color
             | you tack on it.
        
         | faizshah wrote:
         | It's a monkey's paw solution, now you have 15 kinda slow
         | pipelines instead of 3 slow deployment pipelines. And you get
         | to have the fun new problem of deployment planning and
         | synchronizing feature deployments.
        
           | motorest wrote:
           | > It's a monkey's paw solution, now you have 15 kinda slow
           | pipelines instead of 3 slow deployment pipelines.
           | 
           | Not a problem. In fact, they are a solution to a problem.
           | 
           | > And you get to have the fun new problem of deployment
           | planning and synchronizing feature deployments.
           | 
           | Not a problem too. You don't need to synchronize anything if
           | you're consuming changes that are already deployed and
           | running. You also do not need to synchronize feature
           | deployment if you know the very basics of your job. Worst
           | case scenario, you have to move features behind a feature
           | flag, which requires zero synchronization.
           | 
           | This sort of discussion feels like people complaining about
           | perceived problems they never bothers to think about, let
           | alone tackle.
        
         | punnerud wrote:
         | As long as every team managing the different APIs/services
         | don't have to be consulted for others to get access. You then
         | get both the problems of distributed data and even more levels
         | of complexity (more meetings than with a monolith)
        
           | motorest wrote:
           | > As long as every team managing the different APIs/services
           | don't have to be consulted for others to get access.
           | 
           | Worst-case scenario, those meetings take place only when a
           | new consumer starts consuming a producer managed by an
           | external team well outside your org.
           | 
           | Once that rolls out, you don't need any meeting anymore
           | beyond hypothetical SEVs.
        
         | devjab wrote:
         | You can do this with a monolith architecture as others point
         | out. It always comes down to governance. With monoliths you
         | risk slowing yourself down in a huge mess of SOLID, DRY and
         | other "clean code" nonsense which means nobody can change
         | anything without it breaking something. Not because any of the
         | OOP principles are wrong on face value, but because they are so
         | extremely vague that nobody ever gets them right. It's always
         | hilarious to watch Uncle Bob dismiss any criticism with a "they
         | misunderstood the principles" because he's always completely
         | right. Maybe the principles are just bad when so many people
         | get them wrong? Anyway, microservices don't protect you from
         | poor governance it just shows up as different problems. I would
         | argue that it's both extremely easy and common to build a bunch
         | of micro services where nobody knows what effect a change has
         | on others. It comes down to team management, and this is where
         | our industry sucks the most in my experience. It'll be better
         | once the newer generations of "Team Topologies" enter, but
         | it'll be a struggle for decades to come if it'll ever really
         | end. Often it's completely out of the hands of whatever
         | digitalisation department you have because the organisation
         | views any "IT" as a cost center and never requests things in a
         | way that can be incorporated in any sort of SWE best practice
         | process.
         | 
         | One of the reasons I like Go as a general purpose language is
         | that it often leads to code bases which are easy to change by
         | its simplicity by design. I've seen an online bank and a couple
         | of landlord systems (sorry I can't find the English word for
         | asset and tenant management in a single platform) explode in
         | growth. Largely because switching to Go has made it possible
         | for them to actually deliver what the business needs. Mean
         | while their competition remains stuck with unruly Java or C#
         | code bases where they may be capable of rolling out buggy
         | additions every half year if their organisation is lucky. Which
         | has nothing to do with Go, Java or C# by the way, it has to do
         | with old fashioned OOP architecture and design being way too
         | easy to fuck up. In one shop I worked they had over a thousand
         | C# interfaces which were never consumed by more than one
         | class... Every single one of their tens of thousands of
         | interfaces was in the same folder and namespace... good luck
         | finding the one you need. You could do that with Go, or any
         | language, but chances are you won't do it if you're not rolling
         | with one of those older OOP clean code languages. Not doing it
         | with especially C# is harder because abstraction by default is
         | such an ingrained part of the culture around it.
         | 
         | Personally I have a secret affection for Python shops because
         | they are always fast to deliver and terrible in the code. Love
         | it!
        
       | qaq wrote:
       | A bit tangential but why is CloudFormation so slowww?
        
         | justin_oaks wrote:
         | I figure it's because AWS can get away with it.
        
           | shepherdjerred wrote:
           | AWS deploys using cfn internally
        
         | Aeolun wrote:
         | The reason by boss tends to give is that it's made by AWS, so
         | it cannot possibly be bad. Also, it's free. Which is never
         | given as anything more than a tangentially related reason,
         | but...
        
           | Uehreka wrote:
           | It... definitely isn't free. Have you ever looked at the
           | "Config" category of your AWS bill?
        
         | hk1337 wrote:
         | This is just anecdotal but I have found anytime a network
         | interface is involved, it can slow down the deployment. I had a
         | case where I was deleting lambdas in a VPC, and connected to
         | EFS, that the deployment was rather quick but it took ~20
         | minutes for cloudformation to cleanup and finish.
        
         | motorest wrote:
         | > A bit tangential but why is CloudFormation so slowww?
         | 
         | It's not that CloudFormation is slow. It's that the whole
         | concept of infrastructure-as code-as-codd is slow by nature.
         | 
         | Each time you deploy a change to a state as a transaction, you
         | need to assert preconditions and post-conditions at each step.
         | If you have to roll out a set of changes that have any
         | semblance of interdependence, you have no option other than to
         | deploy each change as sequential steps. Each step requires many
         | network calls to apply changes, go through auth, poll state,
         | each one taking somewhere between 50-200ms. That quickly adds
         | up.
         | 
         | If you deploy the same app on a different cloud provider with
         | Terraform or Ansible, you get the same result. If you deploy
         | the same changes manually you turn a few minutes into a day-
         | long ordeal.
         | 
         | The biggest problem with IaC is that it is so high-level and
         | does so much under the hood that some people have no idea what
         | changes they are actually applying or what they are doing. Then
         | they complain it takes so long.
        
           | qaq wrote:
           | Thing is Terraform is faster
        
           | maccard wrote:
           | 50-200ms per poll is one thing, but realistically we're
           | talking 30+ seconds for the smallest of changes even on new
           | resources. Why does it take so long to spin up an ec2
           | instance (when fargate can do it in seconds assuming you're
           | not rate limited by the API) or lambda can do it also in
           | milliseconds. Those machines are already running, why does it
           | take 3 minutes to deploy Ubuntu or Debian from a blessed AMI?
        
             | ianburrell wrote:
             | Fargate is running containers, Lambda functions. They use
             | Firecracker microVM while EC2 uses full VM. EC2 instances
             | does lot more setup, using bigger image, and user setup. My
             | guess is Firecracker is designed for smaller VMs and can't
             | support EC2 features that people need.
        
           | Uehreka wrote:
           | > It's that the whole concept of infrastructure-as code-as-
           | codd is slow by nature.
           | 
           | > If you deploy the same app on a different cloud provider
           | with Terraform or Ansible, you get the same result.
           | 
           | Nope, Terraform is way faster. Anyone who has switched
           | between them on the same project can attest to this.
           | 
           | Also, Terraform does not get into
           | "UPGRADE_ROLLBACK_FAILED"-style unrecoverable states nearly
           | as easily. This happens to me all the time with
           | Cloudformation/CDK. So my second question after "Why is
           | Cloudformation so slow?" would be "Why is Cloudformation more
           | error-prone when it's also slower?"
        
           | mlhpdx wrote:
           | FWIW, my approach to IaC has been to focus on the "I" with
           | CloudFormation -- the networking, storage, IAM, other AWS
           | primitives and etc. This stuff doesn't change as often, and
           | safe/reliable deployments are more valuable than quick ones.
           | 
           | The behavioral parts (aka. application, stuff running in a VM
           | of some kind or something declarative like EventBridge rules
           | or StepFunctions) I keep separate and prioritize quick turns.
           | CodeDeploy can, for example, update code on EC2s in single-
           | digit seconds.
           | 
           | I'm building systems that are a little more integrated in AWS
           | than most folks, perhaps, which makes this approach a good
           | fit. I do dozens of deployments a day (not an exaggeration --
           | 21 so far today on a light day), including a couple
           | infrastructure updates.
           | 
           | I think the secret here is not buying into meme-like
           | simplifications and instead deliberately design an approach
           | that works for your goals.
        
       | jojobas wrote:
       | Fast deployment causes incident war rooms.
        
         | DougBTX wrote:
         | Maybe the opposite, slow rollbacks cause escalating incidents.
        
         | Trasmatta wrote:
         | In my experience, there's very little correlation. I've been on
         | projects with 1 deployment every six weeks, and there were just
         | as many production incidents as projects with daily
         | deployments.
        
         | boxed wrote:
         | I was on a team that went from every 3 weeks to multiple times
         | per day. The number of incidents in production dropped
         | drastically.
         | 
         | But much more important than that drop, was that when things
         | went wrong is was MUCH MUCH faster to find the problem. It was
         | also much safer and easier to roll back, since there were so
         | few changes that would be rolled back. No one wants to back off
         | 3 weeks of work. That's chaos.
        
         | wussboy wrote:
         | That is the opposite of my experience. Slow deploys mean bigger
         | deploys mean more complexity going live mean more nervousness
         | and more testing mean more hesitation mean more chance that
         | something unforeseen mean errors that no one understands mean
         | war rooms.
        
         | wasmitnetzen wrote:
         | Yeah, and slow ones as well.
        
       | sourceless wrote:
       | I think unfortunately the conclusion here is a bit backwards; de-
       | risking deployments by improving testing and organisational
       | properties is important, but is not the only approach that works.
       | 
       | The author notes that there appears to be a fixed number of
       | changes per deployment and that it is hard to increase - I think
       | the 'Reversie Thinkie' here (as the author puts it) is actually
       | to decrease the number of changes per deployment.
       | 
       | The reason those meetings exist is because of risk! The more
       | changes in a deployment, the higher the risk that one of them is
       | going to introduce a bug or operational issue. By deploying small
       | changes often, you get deliver value much sooner and fail
       | smaller.
       | 
       | Combine this with techniques such as canarying and gradual
       | rollout, and you enter a world where deployments are no longer
       | flipping a switch and either breaking or not breaking - you get
       | to turn outages into degradations.
       | 
       | This approach is corroborated by the DORA research[0], and
       | covered well in Accelerate[1]. It also features centrally in The
       | Phoenix Project[2] and its spiritual ancestor, The Goal[3].
       | 
       | [0] https://dora.dev/
       | 
       | [1] https://www.amazon.co.uk/Accelerate-Software-Performing-
       | Tech...
       | 
       | [2] https://www.amazon.co.uk/Phoenix-Project-Helping-Business-
       | An...
       | 
       | [3] https://www.amazon.co.uk/Goal-Process-Ongoing-
       | Improvement/dp...
        
         | tomxor wrote:
         | I tend to agree. Whenever I've removed artificial technical
         | friction, or made a fundamental change to an approach, the
         | processes that grew around them tend to evaporate, and not be
         | replaced. I think many of these processes are a rational albeit
         | non-technical response to making the best of a bad situation in
         | the absence of a more fundamental solution.
         | 
         | But that doesn't mean they are entirely harmless. I've come
         | across some scenarios where the people driving decisions
         | _continued_ to reach for human processes as the solution rather
         | than a workaround, for both new projects and projects
         | designated specifically to remove existing inefficiencies. They
         | either lacked the technical imagination, or were too stuck in
         | the existing framing of the problem, and this is where people
         | who do have that imagination need to speak up and point out
         | that human processes need to be minimised with technical
         | changes where possible. Not all human processes can be obviated
         | through technical changes, but we don 't want to spread
         | ourselves thin on unnecessary ones.
        
         | motorest wrote:
         | > The reason those meetings exist is because of risk! The more
         | changes in a deployment, the higher the risk that one of them
         | is going to introduce a bug or operational issue.
         | 
         | Having worked on projects that were perfectly full CD and also
         | projects that had biweekly releases with meetings with release
         | engineers, I can state with full confidence that risk
         | management is correlated but an indirect and secondary factor.
         | 
         | The main factor is quite clearly how much time and resources an
         | organization invests in automated testing. If an organization
         | has the misfortune of having test engineers who lack the
         | technical background to do automation, they risk never breaking
         | free of these meetings.
         | 
         | The reason why organizations need release meetings is that they
         | lack the infrastructure to test deployments before and after
         | rollouts, and they lack the infrastructure to roll back changes
         | that fail once deployed. So they make up this lack of
         | investment by adding all these ad-hoc manual checks to
         | compensate for lack of automated checks. If QA teams lack any
         | technical skills, they will push for manual processes as self-
         | preservation.
         | 
         | To make matters worse, there is also the propensity to pretend
         | that having to go through these meetings is a sign of
         | excellence and best practices, because if you're paid to
         | mitigate a problem obviously you have absolutely no incentive
         | to fix it. If a bug leaks into production, that's a problem
         | introduced by the developer that wasn't caught by QAs because
         | reasons. If the organization has automated tests, it's even
         | hard to not catch it at the PR level.
         | 
         | Meetings exist not because of risk, but because organizations
         | employ a subset of roles that require risk to justify their
         | existence and lack skills to mitigate it. If a team organizes
         | it's efforts to add the bare minimum checks to verify a change
         | runs and works once deployed, and can automatically roll back
         | if it doesn't, you do not need meetings anymore.
        
           | sourceless wrote:
           | I think we may be violently agreeing - I certainly agree with
           | everything you have said here.
        
           | vegetablepotpie wrote:
           | This is very well said and succinctly summarizes my
           | frustrations with QA. My experience has been that non-
           | technical staff in technical organizations create meetings to
           | justify their existence. I'm curious if you have advice on
           | how to shift non-technical QA towards adopting automated
           | testing and fewer meetings.
        
             | blackjack_ wrote:
             | Hi, senior SRE here who was a QA, then QA lead, then lead
             | automation / devops engineer.
             | 
             | QA engineers with little coding experience should be given
             | simple automation tasks with similar tests and
             | documentation/ people to ask questions to. I.e. setup a
             | pytest framework that has a few automated test examples,
             | and then have them write similar tests. The automated tests
             | are just TAC (tests as code) versions of the manual test
             | cases they should already write, so they should have some
             | idea of what they need to do, and then google / ChatGPT/
             | automation engineers should be able to help them start to
             | translate that to code.
             | 
             | People with growth mindsets and ambitions will grow from
             | the support and being given the chance to do the things,
             | while some small number will balk and not want anything to
             | do with it. You can lead a horse to water and all that.
        
           | gavmor wrote:
           | > The main factor is quite clearly how much time and
           | resources an organization invests in automated testing.
           | 
           | For context, I think it's worth reflecting on Beck's
           | background, eg as the author of _XP Explained_. I suspect he
           | 's taking even TDD for granted, and optimizing what's left. I
           | think even the name of his new blog--"Tidy First"--is in
           | reaction to a saturation, in his milieu, of the imperative to
           | "Test First".
        
         | ozim wrote:
         | I am really interested in organizations capacity of soaking the
         | changes.
         | 
         | I live in B2B SaaS space and as much as development goes we
         | could release daily. But on the receiving side we get pushback.
         | Of course there can be feature flags but then it would cause
         | "not enabled feature backlog".
         | 
         | In the end features are mostly consumed by people and people
         | need training on the changes.
        
           | ajmurmann wrote:
           | I think that really depends on the product. I worked on a on-
           | prem data product for years and it was crucial to document
           | all changes well and give customers time to prepare. OTOH I
           | also worked on a home inspection app and there users gave us
           | pushback on training because the app was seen as intuitive
        
             | paulryanrogers wrote:
             | > ...there users gave us pushback on training because the
             | app was seen as intuitive
             | 
             | I would weep with joy to receive such feedback! Too often
             | the services I work on have long histories with accidental
             | UIs, built to address immediate needs over and over.
        
         | ricardobeat wrote:
         | > By deploying small changes often, you get deliver value much
         | sooner and fail smaller.
         | 
         | Which increases the number of changes per deployment, feeding
         | the overhead cycle.
         | 
         | He is describing an emergent pattern here, not something that
         | requires intentional culture change (like writing smaller
         | changes). You're not disagreeing but paraphrasing the article's
         | conclusion:
         | 
         | > or the harder way, by increasing the number of changes per
         | deployment (better tests, better monitoring, better isolation
         | between elements, better social relationships on the team)
        
           | sourceless wrote:
           | I am disagreeing with the conclusion of the article, and
           | asserting that more and smaller deployments are the better
           | way to go.
        
             | ricardobeat wrote:
             | You are not. The conclusion of the article is the same, you
             | "need to expand the far end of the hose" by increasing
             | deployment rate or making more, smaller changes. What was
             | your interpretation?
        
               | sourceless wrote:
               | My reading was that there were two paths the author
               | highlights:
               | 
               | 1) Increase deployment capacity (which I'm reading as
               | frequency, and I fully agree with)
               | 
               | 2) Increase change capacity per deployment by making it
               | less likely that a set of changes will fail through
               | tests, monitoring, structural, and team changes
               | 
               | #2 is very much geared to "ship more changes in one
               | deployment" which is where my disagreement lies. I think
               | you should still do all those things, but that increasing
               | the size of the bundle is explicitly an anti-goal.
               | 
               | I think you're better off, as a rule of thumb, making
               | fewer changes per deployment if you want to reduce risk.
               | 
               | But -- that is my particular reading of it.
        
         | vasco wrote:
         | I agree entirely - I use the same references, I just think it's
         | bordering on sacrilege what you did to Mr. Goldratt. He has
         | been writing about flow and translating the Toyota Production
         | System principles and applying physics to business processes
         | way before someone decided to write The Phoenix Project.
         | 
         | I loved the Phoenix Project don't get me wrong, but compared to
         | The Goal it's a like a cheaply produced adaptation of a "real"
         | book so that people in the IT industry don't get scared when
         | they read about production lines and run away saying "but I'm a
         | PrOgrAmmEr, and creATIVE woRK can't be OPtiMizEd like a
         | FactOry".
         | 
         | So The Phoenix Project if anything is the spiritual successor
         | to The Goal, not the other way around.
        
           | grncdr wrote:
           | That's exactly what the GP wrote: The Goal is the spiritual
           | _ancestor_ of The Phoenix Project.
        
             | vasco wrote:
             | Well now I can't tell if it was edited or if I just misread
             | and decided to correct my own mistake. I'll leave it be so
             | I remember next time, thanks.
        
               | mrbluecoat wrote:
               | I totally read it as successor as well. Interesting how
               | the brain fills in what we expect to see :)
        
               | sourceless wrote:
               | That's indeed how I wrote it, but I could have worded it
               | better. Very much agree that the insights in The Goal go
               | far beyond the scope of The Phoenix Project.
        
         | lifeisstillgood wrote:
         | So this seems quantifiable as well - there must be a number of
         | processes / components that a business is made up of, and those
         | presumably are also weighted (payment processing has weight
         | 100, HR holiday requests weight 5 etc).
         | 
         | I would conjecture that changing more than 2% of processes in
         | any given period is "too much" - but one can certainly adjust
         | that.
         | 
         | And I suspect that this modifies based on area (ie the payment
         | processing code has a different team than the HR code) - so it
         | would be sensible to rotate releases (or possibly teams) - this
         | period this team is working on the hard stuff, but once that
         | goes live the team is rotated back out to tackle easier stuff -
         | either payment processing or HR
         | 
         | The same principle applies to attacking a trench, moving
         | battalions forward and combined arms operations.
         | 
         | Now that is of course a "management" problem - but one can
         | easily see how to automate a lot of it - and how other
         | "sensory" inputs are useful (ie which teams have committed code
         | to these sensitive modules recently
         | 
         | One last point is it makes nonsense of "sprints" in Agile/Scrum
         | - we know you cannot sprint a whole marathon, so how do you
         | prepare the sprints for rotation?
        
           | gavmor wrote:
           | There are no sprints in agile. ;)
           | 
           | On the contrary, per the Manifesto:
           | 
           | > Agile processes promote sustainable development.
           | 
           | > The sponsors, developers, and users should be able to
           | maintain a constant pace indefinitely.
        
         | manvillej wrote:
         | this isn't even a software things. Its any production process.
         | The greater amount of work in progress items, the longer the
         | work in progress items, the greater risk, the greater amount of
         | work. Shrink the batch, shorten the release window window.
         | 
         | It infuriates me that software engineering has had to
         | rediscover these facts when the Toyota production system was
         | developed between 1948-1975 and knew all these things 50 years
         | ago.
        
       | andy_ppp wrote:
       | The organisation will actively prevent you from trying to improve
       | deployments though, they will say things like "Jenkins shouldn't
       | be near production" or "we can't possibly put things live without
       | QA being involved" or "we need this time to make sure the quality
       | of the software is high enough". All with a straight face while
       | having millions of production bugs and a product that barely
       | meets any user requirements (if there are any).
       | 
       | In the end fighting the bureaucracy is actually impossible in
       | most organisations, especially if you're not part of the 200
       | layers of management that create these meetings. I would sack
       | everyone but programmers and maybe two designers and let everyone
       | fight it out without any agile coaches and product owners and
       | scrum master and product experts.
       | 
       | Slow deployment is a problem but it's not _the_ problem.
        
         | gleenn wrote:
         | You sound very defeatist about fighting bureaucracy. If you
         | work at an org with too much management, you can slowly push to
         | move it in the direction you hope for or leave. If you keep
         | ending up at places that seem impossible to change, perhaps you
         | should ask more questions about this during the interview. I've
         | worked at many small companies where there wasn't crazy
         | bureaucracy because that's definitely what I preferred. I also
         | currently work at a megacorp and yes there is difficulty, but
         | being consistent and persuasive has lead to many things slowly
         | heading in the right direction. Things take time. You have to
         | realize why people have made things some way and then find
         | convincing arguments to make things better. Sometimes places do
         | just suck so don't stick around. But being hopeless doesn't
         | seem helpful.
        
         | gavmor wrote:
         | > Jenkins shouldn't be near production
         | 
         | > we can't possibly put things live without QA being involved
         | 
         | > we need this time to make sure the quality of the software is
         | high enough
         | 
         | I've only developed software professionally since 2012, but in
         | that time not only have I never encountered such sentiments,
         | but (and, perhaps, because) it has always been a top priority
         | of leadership to emphatically insist on _the very opposite_ :
         | day one of any initiative is Jenkins to production--often
         | _directly_ via trunk-based development--and quality is every
         | developer 's responsibility.
         | 
         | At the IC level, there was no "fighting bureaucracy," although
         | I don't doubt leadership debated these things vigorously, from
         | time to time, especially as external partners and stakeholders
         | were often intimately involved.
         | 
         | > I would sack everyone but programmers and maybe two designers
         | and let everyone fight it out
         | 
         | That works for me! But it doesn't scale. We definitely have to
         | keep at least one product "owner" or "expert" or "manager" to
         | enqueue stakeholder priorities and, while this can be a "hat"
         | that devs and designers trade off, it's also a skill at which
         | some individuals uniquely excel.
         | 
         | All that being said, I don't want to come across as pearl-
         | clutching, shocked Pikachu face about this. I understand that
         | many organizations don't operate this way. The way I've helped
         | firms make this change is via the introduction of a single,
         | experimental team of _volunteers_ dedicated to these practices
         | --one protected (but not dictated to) by a mandate from on
         | high.
         | 
         | But, then again, this is California.
        
         | lifeisstillgood wrote:
         | This is more or less Musk's approach at Twitter - and ignoring
         | the enormous baggage any discussion with Musk brings (if
         | possible) - I would love to see a real academic case study on
         | the effects of that to Twitter - there will be a lot to unpick
         | but my bias is on your side here.
        
         | xorcist wrote:
         | > Jenkins shouldn't be near production
         | 
         | All of which sounds completely reasonable to me, in many
         | situations.
         | 
         | Jenkins is the Wordpress of software development. It's gigantic
         | state loop that runs plugins with no privilege separation.
         | Giving your jenkins instance administrative credentials in
         | production might very well be equivalent to giving root keys to
         | that lone guy in Romania who authored that plugin you never
         | audited. I can understand perfectly why that might not be
         | desirable to everyone.
         | 
         | .. which neatly leads on to
         | 
         | > we can't possibly put things live without QA being involved
         | 
         | If you deploy stuff in production that never passes QA, why do
         | you even have QA? To fix stuff later?
         | 
         | If they are not empowered they will never have the chance to do
         | a good job or have any pride in their work.
        
       | austin-cheney wrote:
       | While this is mostly correct it's also just as irrelevant.
       | 
       | TLDR; software performance, thus human performance, is all that
       | matters.
       | 
       | Risk management/acceptance can be measured with numbers. In
       | software this is actually far more straightforward than in many
       | other careers, because software engineers can only accept risk
       | within the restrictions of their known operating constraints and
       | everything else is deferred.
       | 
       | If you want to go faster you need to maximize the frequency of
       | human iteration above absolutely everything else. If a person
       | cannot iterate, such as waiting on permissions, they are blocked.
       | If they are waiting on a build or screen refresh they are slowed.
       | This can also be measured with numbers.
       | 
       | If person A can iterate 100x faster than person B correctness
       | becomes irrelevant. Person B must maximize upon correctness
       | because they are slow. To be faster and more correct person A has
       | extreme flexibility to learn, fail, and improve beyond what
       | person B can deliver.
       | 
       | Part of iterating faster AND reducing risk is fast test
       | automation. If person A can execute 90+% test coverage in time of
       | 4 human iterations then that test automation is still 25x faster
       | than one person B iteration with a 90+% lower risk of regression.
        
       | vegetablepotpie wrote:
       | I have personal experience with this in my professional career.
       | Before Christmas break I had a big change, and there was fear. My
       | org responded by increasing testing (regression testing, which
       | increased overhead). This increased the risk that changes on dev
       | would break changes on my branch (not a code merging way, but in
       | a _complex adaptive system_ way).
       | 
       | I responded to this risk by making a meeting. I presented our
       | project schedule, and told my colleagues about their
       | _expectations_ , I.e. if they drop code style comments on the PRs
       | they will be deferred to a future PR (and then ignored and never
       | done).
       | 
       | What we needed _is_ fine grained testing with better isolation
       | between components. The problem is is that our management is at a
       | high level, they don't see meetings as a means to an end, they
       | see meetings as a worthy goal in and of itself self to achieve.
       | More meetings means more collaboration, means good. I'd love to
       | see advice on how to lead technical changes with non-technical
       | management.
        
       | lifeisstillgood wrote:
       | I am trying to expound a concept I call "software literacy" -
       | where a business can be run via code just as much as today a
       | company can be run by English words (policy documents, emails
       | etc).
       | 
       | This leads to a few corollaries - things like "If GPUs do the
       | work then coders are the new managers" or we need whole-org-test-
       | rigs to be clear about the impacts of chnages.
       | 
       | This seems directly related to this excellent article - to my
       | mind if all the decision makers are not looking at the code as
       | the first class object in a chnage process (is opposed to Jiras
       | or project plans) then not all decision makers are (software)
       | literate - and this comes up a lot in the threads here ("how do I
       | discuss with non-technical management") - the answer is you
       | cannot - that management must be changed. This is an enormous
       | generational road block that I thought was a problem thirty years
       | ago but naively assumed would disappear as coders grew up. Of
       | course the problem is that to "run" a company one does not need
       | to code - so until not coding is something embarrassing like not
       | writing is for a newspaper editor we won't get past it.
       | 
       | The main point is that we need companies that can be run with the
       | new set of self-reinforcing concepts - sops, testing, not
       | meetings but systems as communication.
       | 
       | I will try and rewrite this comment later - it needs work
        
       | braza wrote:
       | A marginally related point but I do not know if others faced the
       | following situation: I worked in a place with a CI pipeline room
       | ~25 minutes with the unit/integration tests (3000+) taking 18
       | minutes.
       | 
       | When something happens in production we ended up placing more
       | tests; and of course when things goes south at least 50 minutes
       | were necessary to recover.
       | 
       | After a lot of consideration we decided to focus on the recovery
       | and relax and simply some tests and focus on recovery (i.e. have
       | the full thing in less than 5 minutes) combined with a canary as
       | deployment strategy (instead rolling updates).
       | 
       | At least for us was a so refreshing experience but sounded wrong
       | in some ways.
        
         | wussboy wrote:
         | I've often said that it is the speed of deployment that
         | matters. If it takes you 50 minutes to deploy, it takes you 50
         | minutes to fix a problem. If it takes you 50 seconds to deploy,
         | it takes you 50 seconds to fix a problem.
         | 
         | Of course all kinds of things are rolled up in that speed to
         | deploy, but almost all of them are good.
        
       | tpoacher wrote:
       | Meetings (used right) are a great tool, in the same sense that
       | project planners (used right) are a great tool.
       | 
       | But then there's Jira.
       | 
       | /s
        
       ___________________________________________________________________
       (page generated 2024-12-22 23:00 UTC)