[HN Gopher] Slow deployment causes meetings (2015)
___________________________________________________________________
Slow deployment causes meetings (2015)
Author : fagnerbrack
Score : 174 points
Date : 2024-12-22 03:12 UTC (19 hours ago)
(HTM) web link (tidyfirst.substack.com)
(TXT) w3m dump (tidyfirst.substack.com)
| dang wrote:
| Related:
|
| _Slow Deployment Causes Meetings_ -
| https://news.ycombinator.com/item?id=10622834 - Nov 2015 (26
| comments)
| yarg wrote:
| I had a boss who actually acknowledged that he was deliberately
| holding up my development process - this was a man who refused to
| allow me a four day working week.
| Sparkyte wrote:
| Sounds like a process problem. 2024 development cycles should be
| able to handle multiple lanes of development and deployments.
| Also why things moved to microservices so you can deploy with
| minimal impact as long as you don't tightly couple your
| dependencies.
| m00x wrote:
| You don't need microservices to do this. It's actually easier
| deploying a monolith with internal dependencies than deploying
| microservices that depend on each other.
| adrianpike wrote:
| This is very accurate - microservices can be great as a
| forcing function to revisit your architectural boundaries,
| but if all you do is add a network hop and multiple
| components to update when you tweak a data model, all you'll
| get is headcount sprawl and deadlock to the moon.
|
| I'm a huge fan of migrating to microservices as a secondary
| outcome of revisiting your component boundaries, but just
| moving to separate repos & artifacts so we can all deploy
| independently is a recipe for pain.
| jrs235 wrote:
| and a recipe for "career" driven managers and directors to
| grow department head count, budget oversight, and self
| importance.
| Sparkyte wrote:
| Network hop isn't needed if you're deploying your
| microservices correctly. So you can make pod groups inside
| of kubernetes and application that depends on another can
| call that lightweight container contained in that pod
| group. Pods inherently know the other is there in their
| group it has some or like network call without traversing
| hardware.
| Sparkyte wrote:
| I know microservices and monoliths are a heated topic.
| However breaking up complicated code to preserve user
| experience is sometimes essential. However you can have
| machines that contain many services and that interact with
| each for performance if needed. You would put them into pod
| groups while deploying to kubernetes and have them call their
| service inside of the pod. This can increase performance and
| through put.
| lizzas wrote:
| Microservices lets you horizontally scale deployment frequency
| too.
| theptip wrote:
| Not a silver bullet; you increase api versioning overhead
| between services for example.
| whateveracct wrote:
| True but your API won't be changing that rapidly especially
| in a backwards-incompatible way.
| dhfuuvyvtt wrote:
| What's that got to do with microservices?
|
| Edit, because you can avoid those things in a monolith.
| motorest wrote:
| > Not a silver bullet; you increase api versioning overhead
| between services for example.
|
| That's actually a good thing. That ensures clients remain
| backwards compatible in case of a rollback. The only people
| who don't notice the need for API versionin are those who are
| oblivious to the outages they create.
| fulafel wrote:
| I think this was the meme before moduliths[1][2] where people
| conflated the operational and code change aspects of
| microservices. But it's just additional incidental complexity
| that you should resist.
|
| IOW you can do as many deploys without microservices if you
| organize your monolithic app as independent modules, while
| keeping out the main disadvantages of the microservice
| (infra/cicd/etc complexity, and turning your app's function
| calls into a unreliable distributed system communication
| problem).
|
| [1] https://www.fearofoblivion.com/build-a-modular-monolith-
| firs...
|
| [2] https://ardalis.com/introducing-modular-monoliths-
| goldilocks...
| trog wrote:
| An old monolithic PHP application I worked on for over a
| decade wasn't set up with independent modules and the average
| deploy probably took a couple seconds, because it was an svn
| up which only updated changed files.
|
| I frequently think about this when I watch my current
| workplace's node application go through a huge build process,
| spitting out a 70mb artifact which is then copied multiple
| times around the entire universe as a whole chonk before
| finally ending up where it needs to be several tens of
| minutes later.
| fulafel wrote:
| Yeah, if something even simpler works, that's of course
| even better.
|
| I'd argue the difference between that PHP app and the Node
| app wasn't the lack of modularity, you could have a
| modulith with the same fast deploy.
|
| (But of course modulith is too just extra complexity if you
| don't need it)
| withinboredom wrote:
| Even watching how php applications get deployed these days,
| where it goes through this huge thing and takes about the
| same amount of time to replace all the docker containers.
| trog wrote:
| I avoid Docker for precisely that reason! I have one
| system running on Docker across our whole org - Stirling-
| PDF providing some basic PDF services for internal use.
| Each time I update it I have to watch it download 700mb
| of Docker stuff, instead of just doing an in-place
| upgrade of a few files.
|
| I get that there are advantages in shipping stuff like
| this. But having seen PHP stuff work for decades with in-
| place deploys and no build process I am just continually
| disappointed with how much worse the experience has
| become.
| motorest wrote:
| > I think this was the meme before moduliths[1][2] where
| people conflated the operational and code change aspects of
| microservices.
|
| People conflate the operational and code change aspects of
| microservices just like people conflate that the sky is blue
| and water is wet. It's a statement of fact that doesn't go
| away with buzzwords.
|
| > IOW you can do as many deploys without microservices if you
| organize your monolithic app as independent modules, while
| keeping out the main disadvantages of the microservice
| (infra/cicd/etc complexity, and turning your app's function
| calls into a unreliable distributed system communication
| problem).
|
| This personal opinion is deep within "not even false"
| territory. You can also deploy as many times as you'd like
| with any monolith, regardless of what buzzwords you tack on
| that.
|
| What you're completely missing from your remark is the
| loosely coupled nature of running things on a separate
| service, how trivial it is to do blue-green deployments, and
| how you can do gradual rollouts that you absolutely cannot do
| with a patch to a monolith, no matter what buzzwords you tack
| on it. That is the whole point of mentioning microservices:
| you can do all that without a single meeting.
| fulafel wrote:
| I seem to have struck a nerve!
|
| While there may be some things that can come for free with
| microservices (and not moduliths), your mentioned ones
| don't sound convincing. Blue-green deployments and gradual
| rollouts can be done with modulith and can't think of any
| reason that would be harder than with microservices (part
| of your running instances can run with a different version
| of module X). The coupling can be just as loose as with
| microservices.
| jmulho wrote:
| Blue-green deployments is a buzzword no matter what color
| you tack on it.
| faizshah wrote:
| It's a monkey's paw solution, now you have 15 kinda slow
| pipelines instead of 3 slow deployment pipelines. And you get
| to have the fun new problem of deployment planning and
| synchronizing feature deployments.
| motorest wrote:
| > It's a monkey's paw solution, now you have 15 kinda slow
| pipelines instead of 3 slow deployment pipelines.
|
| Not a problem. In fact, they are a solution to a problem.
|
| > And you get to have the fun new problem of deployment
| planning and synchronizing feature deployments.
|
| Not a problem too. You don't need to synchronize anything if
| you're consuming changes that are already deployed and
| running. You also do not need to synchronize feature
| deployment if you know the very basics of your job. Worst
| case scenario, you have to move features behind a feature
| flag, which requires zero synchronization.
|
| This sort of discussion feels like people complaining about
| perceived problems they never bothers to think about, let
| alone tackle.
| punnerud wrote:
| As long as every team managing the different APIs/services
| don't have to be consulted for others to get access. You then
| get both the problems of distributed data and even more levels
| of complexity (more meetings than with a monolith)
| motorest wrote:
| > As long as every team managing the different APIs/services
| don't have to be consulted for others to get access.
|
| Worst-case scenario, those meetings take place only when a
| new consumer starts consuming a producer managed by an
| external team well outside your org.
|
| Once that rolls out, you don't need any meeting anymore
| beyond hypothetical SEVs.
| devjab wrote:
| You can do this with a monolith architecture as others point
| out. It always comes down to governance. With monoliths you
| risk slowing yourself down in a huge mess of SOLID, DRY and
| other "clean code" nonsense which means nobody can change
| anything without it breaking something. Not because any of the
| OOP principles are wrong on face value, but because they are so
| extremely vague that nobody ever gets them right. It's always
| hilarious to watch Uncle Bob dismiss any criticism with a "they
| misunderstood the principles" because he's always completely
| right. Maybe the principles are just bad when so many people
| get them wrong? Anyway, microservices don't protect you from
| poor governance it just shows up as different problems. I would
| argue that it's both extremely easy and common to build a bunch
| of micro services where nobody knows what effect a change has
| on others. It comes down to team management, and this is where
| our industry sucks the most in my experience. It'll be better
| once the newer generations of "Team Topologies" enter, but
| it'll be a struggle for decades to come if it'll ever really
| end. Often it's completely out of the hands of whatever
| digitalisation department you have because the organisation
| views any "IT" as a cost center and never requests things in a
| way that can be incorporated in any sort of SWE best practice
| process.
|
| One of the reasons I like Go as a general purpose language is
| that it often leads to code bases which are easy to change by
| its simplicity by design. I've seen an online bank and a couple
| of landlord systems (sorry I can't find the English word for
| asset and tenant management in a single platform) explode in
| growth. Largely because switching to Go has made it possible
| for them to actually deliver what the business needs. Mean
| while their competition remains stuck with unruly Java or C#
| code bases where they may be capable of rolling out buggy
| additions every half year if their organisation is lucky. Which
| has nothing to do with Go, Java or C# by the way, it has to do
| with old fashioned OOP architecture and design being way too
| easy to fuck up. In one shop I worked they had over a thousand
| C# interfaces which were never consumed by more than one
| class... Every single one of their tens of thousands of
| interfaces was in the same folder and namespace... good luck
| finding the one you need. You could do that with Go, or any
| language, but chances are you won't do it if you're not rolling
| with one of those older OOP clean code languages. Not doing it
| with especially C# is harder because abstraction by default is
| such an ingrained part of the culture around it.
|
| Personally I have a secret affection for Python shops because
| they are always fast to deliver and terrible in the code. Love
| it!
| qaq wrote:
| A bit tangential but why is CloudFormation so slowww?
| justin_oaks wrote:
| I figure it's because AWS can get away with it.
| shepherdjerred wrote:
| AWS deploys using cfn internally
| Aeolun wrote:
| The reason by boss tends to give is that it's made by AWS, so
| it cannot possibly be bad. Also, it's free. Which is never
| given as anything more than a tangentially related reason,
| but...
| Uehreka wrote:
| It... definitely isn't free. Have you ever looked at the
| "Config" category of your AWS bill?
| hk1337 wrote:
| This is just anecdotal but I have found anytime a network
| interface is involved, it can slow down the deployment. I had a
| case where I was deleting lambdas in a VPC, and connected to
| EFS, that the deployment was rather quick but it took ~20
| minutes for cloudformation to cleanup and finish.
| motorest wrote:
| > A bit tangential but why is CloudFormation so slowww?
|
| It's not that CloudFormation is slow. It's that the whole
| concept of infrastructure-as code-as-codd is slow by nature.
|
| Each time you deploy a change to a state as a transaction, you
| need to assert preconditions and post-conditions at each step.
| If you have to roll out a set of changes that have any
| semblance of interdependence, you have no option other than to
| deploy each change as sequential steps. Each step requires many
| network calls to apply changes, go through auth, poll state,
| each one taking somewhere between 50-200ms. That quickly adds
| up.
|
| If you deploy the same app on a different cloud provider with
| Terraform or Ansible, you get the same result. If you deploy
| the same changes manually you turn a few minutes into a day-
| long ordeal.
|
| The biggest problem with IaC is that it is so high-level and
| does so much under the hood that some people have no idea what
| changes they are actually applying or what they are doing. Then
| they complain it takes so long.
| qaq wrote:
| Thing is Terraform is faster
| maccard wrote:
| 50-200ms per poll is one thing, but realistically we're
| talking 30+ seconds for the smallest of changes even on new
| resources. Why does it take so long to spin up an ec2
| instance (when fargate can do it in seconds assuming you're
| not rate limited by the API) or lambda can do it also in
| milliseconds. Those machines are already running, why does it
| take 3 minutes to deploy Ubuntu or Debian from a blessed AMI?
| ianburrell wrote:
| Fargate is running containers, Lambda functions. They use
| Firecracker microVM while EC2 uses full VM. EC2 instances
| does lot more setup, using bigger image, and user setup. My
| guess is Firecracker is designed for smaller VMs and can't
| support EC2 features that people need.
| Uehreka wrote:
| > It's that the whole concept of infrastructure-as code-as-
| codd is slow by nature.
|
| > If you deploy the same app on a different cloud provider
| with Terraform or Ansible, you get the same result.
|
| Nope, Terraform is way faster. Anyone who has switched
| between them on the same project can attest to this.
|
| Also, Terraform does not get into
| "UPGRADE_ROLLBACK_FAILED"-style unrecoverable states nearly
| as easily. This happens to me all the time with
| Cloudformation/CDK. So my second question after "Why is
| Cloudformation so slow?" would be "Why is Cloudformation more
| error-prone when it's also slower?"
| mlhpdx wrote:
| FWIW, my approach to IaC has been to focus on the "I" with
| CloudFormation -- the networking, storage, IAM, other AWS
| primitives and etc. This stuff doesn't change as often, and
| safe/reliable deployments are more valuable than quick ones.
|
| The behavioral parts (aka. application, stuff running in a VM
| of some kind or something declarative like EventBridge rules
| or StepFunctions) I keep separate and prioritize quick turns.
| CodeDeploy can, for example, update code on EC2s in single-
| digit seconds.
|
| I'm building systems that are a little more integrated in AWS
| than most folks, perhaps, which makes this approach a good
| fit. I do dozens of deployments a day (not an exaggeration --
| 21 so far today on a light day), including a couple
| infrastructure updates.
|
| I think the secret here is not buying into meme-like
| simplifications and instead deliberately design an approach
| that works for your goals.
| jojobas wrote:
| Fast deployment causes incident war rooms.
| DougBTX wrote:
| Maybe the opposite, slow rollbacks cause escalating incidents.
| Trasmatta wrote:
| In my experience, there's very little correlation. I've been on
| projects with 1 deployment every six weeks, and there were just
| as many production incidents as projects with daily
| deployments.
| boxed wrote:
| I was on a team that went from every 3 weeks to multiple times
| per day. The number of incidents in production dropped
| drastically.
|
| But much more important than that drop, was that when things
| went wrong is was MUCH MUCH faster to find the problem. It was
| also much safer and easier to roll back, since there were so
| few changes that would be rolled back. No one wants to back off
| 3 weeks of work. That's chaos.
| wussboy wrote:
| That is the opposite of my experience. Slow deploys mean bigger
| deploys mean more complexity going live mean more nervousness
| and more testing mean more hesitation mean more chance that
| something unforeseen mean errors that no one understands mean
| war rooms.
| wasmitnetzen wrote:
| Yeah, and slow ones as well.
| sourceless wrote:
| I think unfortunately the conclusion here is a bit backwards; de-
| risking deployments by improving testing and organisational
| properties is important, but is not the only approach that works.
|
| The author notes that there appears to be a fixed number of
| changes per deployment and that it is hard to increase - I think
| the 'Reversie Thinkie' here (as the author puts it) is actually
| to decrease the number of changes per deployment.
|
| The reason those meetings exist is because of risk! The more
| changes in a deployment, the higher the risk that one of them is
| going to introduce a bug or operational issue. By deploying small
| changes often, you get deliver value much sooner and fail
| smaller.
|
| Combine this with techniques such as canarying and gradual
| rollout, and you enter a world where deployments are no longer
| flipping a switch and either breaking or not breaking - you get
| to turn outages into degradations.
|
| This approach is corroborated by the DORA research[0], and
| covered well in Accelerate[1]. It also features centrally in The
| Phoenix Project[2] and its spiritual ancestor, The Goal[3].
|
| [0] https://dora.dev/
|
| [1] https://www.amazon.co.uk/Accelerate-Software-Performing-
| Tech...
|
| [2] https://www.amazon.co.uk/Phoenix-Project-Helping-Business-
| An...
|
| [3] https://www.amazon.co.uk/Goal-Process-Ongoing-
| Improvement/dp...
| tomxor wrote:
| I tend to agree. Whenever I've removed artificial technical
| friction, or made a fundamental change to an approach, the
| processes that grew around them tend to evaporate, and not be
| replaced. I think many of these processes are a rational albeit
| non-technical response to making the best of a bad situation in
| the absence of a more fundamental solution.
|
| But that doesn't mean they are entirely harmless. I've come
| across some scenarios where the people driving decisions
| _continued_ to reach for human processes as the solution rather
| than a workaround, for both new projects and projects
| designated specifically to remove existing inefficiencies. They
| either lacked the technical imagination, or were too stuck in
| the existing framing of the problem, and this is where people
| who do have that imagination need to speak up and point out
| that human processes need to be minimised with technical
| changes where possible. Not all human processes can be obviated
| through technical changes, but we don 't want to spread
| ourselves thin on unnecessary ones.
| motorest wrote:
| > The reason those meetings exist is because of risk! The more
| changes in a deployment, the higher the risk that one of them
| is going to introduce a bug or operational issue.
|
| Having worked on projects that were perfectly full CD and also
| projects that had biweekly releases with meetings with release
| engineers, I can state with full confidence that risk
| management is correlated but an indirect and secondary factor.
|
| The main factor is quite clearly how much time and resources an
| organization invests in automated testing. If an organization
| has the misfortune of having test engineers who lack the
| technical background to do automation, they risk never breaking
| free of these meetings.
|
| The reason why organizations need release meetings is that they
| lack the infrastructure to test deployments before and after
| rollouts, and they lack the infrastructure to roll back changes
| that fail once deployed. So they make up this lack of
| investment by adding all these ad-hoc manual checks to
| compensate for lack of automated checks. If QA teams lack any
| technical skills, they will push for manual processes as self-
| preservation.
|
| To make matters worse, there is also the propensity to pretend
| that having to go through these meetings is a sign of
| excellence and best practices, because if you're paid to
| mitigate a problem obviously you have absolutely no incentive
| to fix it. If a bug leaks into production, that's a problem
| introduced by the developer that wasn't caught by QAs because
| reasons. If the organization has automated tests, it's even
| hard to not catch it at the PR level.
|
| Meetings exist not because of risk, but because organizations
| employ a subset of roles that require risk to justify their
| existence and lack skills to mitigate it. If a team organizes
| it's efforts to add the bare minimum checks to verify a change
| runs and works once deployed, and can automatically roll back
| if it doesn't, you do not need meetings anymore.
| sourceless wrote:
| I think we may be violently agreeing - I certainly agree with
| everything you have said here.
| vegetablepotpie wrote:
| This is very well said and succinctly summarizes my
| frustrations with QA. My experience has been that non-
| technical staff in technical organizations create meetings to
| justify their existence. I'm curious if you have advice on
| how to shift non-technical QA towards adopting automated
| testing and fewer meetings.
| blackjack_ wrote:
| Hi, senior SRE here who was a QA, then QA lead, then lead
| automation / devops engineer.
|
| QA engineers with little coding experience should be given
| simple automation tasks with similar tests and
| documentation/ people to ask questions to. I.e. setup a
| pytest framework that has a few automated test examples,
| and then have them write similar tests. The automated tests
| are just TAC (tests as code) versions of the manual test
| cases they should already write, so they should have some
| idea of what they need to do, and then google / ChatGPT/
| automation engineers should be able to help them start to
| translate that to code.
|
| People with growth mindsets and ambitions will grow from
| the support and being given the chance to do the things,
| while some small number will balk and not want anything to
| do with it. You can lead a horse to water and all that.
| gavmor wrote:
| > The main factor is quite clearly how much time and
| resources an organization invests in automated testing.
|
| For context, I think it's worth reflecting on Beck's
| background, eg as the author of _XP Explained_. I suspect he
| 's taking even TDD for granted, and optimizing what's left. I
| think even the name of his new blog--"Tidy First"--is in
| reaction to a saturation, in his milieu, of the imperative to
| "Test First".
| ozim wrote:
| I am really interested in organizations capacity of soaking the
| changes.
|
| I live in B2B SaaS space and as much as development goes we
| could release daily. But on the receiving side we get pushback.
| Of course there can be feature flags but then it would cause
| "not enabled feature backlog".
|
| In the end features are mostly consumed by people and people
| need training on the changes.
| ajmurmann wrote:
| I think that really depends on the product. I worked on a on-
| prem data product for years and it was crucial to document
| all changes well and give customers time to prepare. OTOH I
| also worked on a home inspection app and there users gave us
| pushback on training because the app was seen as intuitive
| paulryanrogers wrote:
| > ...there users gave us pushback on training because the
| app was seen as intuitive
|
| I would weep with joy to receive such feedback! Too often
| the services I work on have long histories with accidental
| UIs, built to address immediate needs over and over.
| ricardobeat wrote:
| > By deploying small changes often, you get deliver value much
| sooner and fail smaller.
|
| Which increases the number of changes per deployment, feeding
| the overhead cycle.
|
| He is describing an emergent pattern here, not something that
| requires intentional culture change (like writing smaller
| changes). You're not disagreeing but paraphrasing the article's
| conclusion:
|
| > or the harder way, by increasing the number of changes per
| deployment (better tests, better monitoring, better isolation
| between elements, better social relationships on the team)
| sourceless wrote:
| I am disagreeing with the conclusion of the article, and
| asserting that more and smaller deployments are the better
| way to go.
| ricardobeat wrote:
| You are not. The conclusion of the article is the same, you
| "need to expand the far end of the hose" by increasing
| deployment rate or making more, smaller changes. What was
| your interpretation?
| sourceless wrote:
| My reading was that there were two paths the author
| highlights:
|
| 1) Increase deployment capacity (which I'm reading as
| frequency, and I fully agree with)
|
| 2) Increase change capacity per deployment by making it
| less likely that a set of changes will fail through
| tests, monitoring, structural, and team changes
|
| #2 is very much geared to "ship more changes in one
| deployment" which is where my disagreement lies. I think
| you should still do all those things, but that increasing
| the size of the bundle is explicitly an anti-goal.
|
| I think you're better off, as a rule of thumb, making
| fewer changes per deployment if you want to reduce risk.
|
| But -- that is my particular reading of it.
| vasco wrote:
| I agree entirely - I use the same references, I just think it's
| bordering on sacrilege what you did to Mr. Goldratt. He has
| been writing about flow and translating the Toyota Production
| System principles and applying physics to business processes
| way before someone decided to write The Phoenix Project.
|
| I loved the Phoenix Project don't get me wrong, but compared to
| The Goal it's a like a cheaply produced adaptation of a "real"
| book so that people in the IT industry don't get scared when
| they read about production lines and run away saying "but I'm a
| PrOgrAmmEr, and creATIVE woRK can't be OPtiMizEd like a
| FactOry".
|
| So The Phoenix Project if anything is the spiritual successor
| to The Goal, not the other way around.
| grncdr wrote:
| That's exactly what the GP wrote: The Goal is the spiritual
| _ancestor_ of The Phoenix Project.
| vasco wrote:
| Well now I can't tell if it was edited or if I just misread
| and decided to correct my own mistake. I'll leave it be so
| I remember next time, thanks.
| mrbluecoat wrote:
| I totally read it as successor as well. Interesting how
| the brain fills in what we expect to see :)
| sourceless wrote:
| That's indeed how I wrote it, but I could have worded it
| better. Very much agree that the insights in The Goal go
| far beyond the scope of The Phoenix Project.
| lifeisstillgood wrote:
| So this seems quantifiable as well - there must be a number of
| processes / components that a business is made up of, and those
| presumably are also weighted (payment processing has weight
| 100, HR holiday requests weight 5 etc).
|
| I would conjecture that changing more than 2% of processes in
| any given period is "too much" - but one can certainly adjust
| that.
|
| And I suspect that this modifies based on area (ie the payment
| processing code has a different team than the HR code) - so it
| would be sensible to rotate releases (or possibly teams) - this
| period this team is working on the hard stuff, but once that
| goes live the team is rotated back out to tackle easier stuff -
| either payment processing or HR
|
| The same principle applies to attacking a trench, moving
| battalions forward and combined arms operations.
|
| Now that is of course a "management" problem - but one can
| easily see how to automate a lot of it - and how other
| "sensory" inputs are useful (ie which teams have committed code
| to these sensitive modules recently
|
| One last point is it makes nonsense of "sprints" in Agile/Scrum
| - we know you cannot sprint a whole marathon, so how do you
| prepare the sprints for rotation?
| gavmor wrote:
| There are no sprints in agile. ;)
|
| On the contrary, per the Manifesto:
|
| > Agile processes promote sustainable development.
|
| > The sponsors, developers, and users should be able to
| maintain a constant pace indefinitely.
| manvillej wrote:
| this isn't even a software things. Its any production process.
| The greater amount of work in progress items, the longer the
| work in progress items, the greater risk, the greater amount of
| work. Shrink the batch, shorten the release window window.
|
| It infuriates me that software engineering has had to
| rediscover these facts when the Toyota production system was
| developed between 1948-1975 and knew all these things 50 years
| ago.
| andy_ppp wrote:
| The organisation will actively prevent you from trying to improve
| deployments though, they will say things like "Jenkins shouldn't
| be near production" or "we can't possibly put things live without
| QA being involved" or "we need this time to make sure the quality
| of the software is high enough". All with a straight face while
| having millions of production bugs and a product that barely
| meets any user requirements (if there are any).
|
| In the end fighting the bureaucracy is actually impossible in
| most organisations, especially if you're not part of the 200
| layers of management that create these meetings. I would sack
| everyone but programmers and maybe two designers and let everyone
| fight it out without any agile coaches and product owners and
| scrum master and product experts.
|
| Slow deployment is a problem but it's not _the_ problem.
| gleenn wrote:
| You sound very defeatist about fighting bureaucracy. If you
| work at an org with too much management, you can slowly push to
| move it in the direction you hope for or leave. If you keep
| ending up at places that seem impossible to change, perhaps you
| should ask more questions about this during the interview. I've
| worked at many small companies where there wasn't crazy
| bureaucracy because that's definitely what I preferred. I also
| currently work at a megacorp and yes there is difficulty, but
| being consistent and persuasive has lead to many things slowly
| heading in the right direction. Things take time. You have to
| realize why people have made things some way and then find
| convincing arguments to make things better. Sometimes places do
| just suck so don't stick around. But being hopeless doesn't
| seem helpful.
| gavmor wrote:
| > Jenkins shouldn't be near production
|
| > we can't possibly put things live without QA being involved
|
| > we need this time to make sure the quality of the software is
| high enough
|
| I've only developed software professionally since 2012, but in
| that time not only have I never encountered such sentiments,
| but (and, perhaps, because) it has always been a top priority
| of leadership to emphatically insist on _the very opposite_ :
| day one of any initiative is Jenkins to production--often
| _directly_ via trunk-based development--and quality is every
| developer 's responsibility.
|
| At the IC level, there was no "fighting bureaucracy," although
| I don't doubt leadership debated these things vigorously, from
| time to time, especially as external partners and stakeholders
| were often intimately involved.
|
| > I would sack everyone but programmers and maybe two designers
| and let everyone fight it out
|
| That works for me! But it doesn't scale. We definitely have to
| keep at least one product "owner" or "expert" or "manager" to
| enqueue stakeholder priorities and, while this can be a "hat"
| that devs and designers trade off, it's also a skill at which
| some individuals uniquely excel.
|
| All that being said, I don't want to come across as pearl-
| clutching, shocked Pikachu face about this. I understand that
| many organizations don't operate this way. The way I've helped
| firms make this change is via the introduction of a single,
| experimental team of _volunteers_ dedicated to these practices
| --one protected (but not dictated to) by a mandate from on
| high.
|
| But, then again, this is California.
| lifeisstillgood wrote:
| This is more or less Musk's approach at Twitter - and ignoring
| the enormous baggage any discussion with Musk brings (if
| possible) - I would love to see a real academic case study on
| the effects of that to Twitter - there will be a lot to unpick
| but my bias is on your side here.
| xorcist wrote:
| > Jenkins shouldn't be near production
|
| All of which sounds completely reasonable to me, in many
| situations.
|
| Jenkins is the Wordpress of software development. It's gigantic
| state loop that runs plugins with no privilege separation.
| Giving your jenkins instance administrative credentials in
| production might very well be equivalent to giving root keys to
| that lone guy in Romania who authored that plugin you never
| audited. I can understand perfectly why that might not be
| desirable to everyone.
|
| .. which neatly leads on to
|
| > we can't possibly put things live without QA being involved
|
| If you deploy stuff in production that never passes QA, why do
| you even have QA? To fix stuff later?
|
| If they are not empowered they will never have the chance to do
| a good job or have any pride in their work.
| austin-cheney wrote:
| While this is mostly correct it's also just as irrelevant.
|
| TLDR; software performance, thus human performance, is all that
| matters.
|
| Risk management/acceptance can be measured with numbers. In
| software this is actually far more straightforward than in many
| other careers, because software engineers can only accept risk
| within the restrictions of their known operating constraints and
| everything else is deferred.
|
| If you want to go faster you need to maximize the frequency of
| human iteration above absolutely everything else. If a person
| cannot iterate, such as waiting on permissions, they are blocked.
| If they are waiting on a build or screen refresh they are slowed.
| This can also be measured with numbers.
|
| If person A can iterate 100x faster than person B correctness
| becomes irrelevant. Person B must maximize upon correctness
| because they are slow. To be faster and more correct person A has
| extreme flexibility to learn, fail, and improve beyond what
| person B can deliver.
|
| Part of iterating faster AND reducing risk is fast test
| automation. If person A can execute 90+% test coverage in time of
| 4 human iterations then that test automation is still 25x faster
| than one person B iteration with a 90+% lower risk of regression.
| vegetablepotpie wrote:
| I have personal experience with this in my professional career.
| Before Christmas break I had a big change, and there was fear. My
| org responded by increasing testing (regression testing, which
| increased overhead). This increased the risk that changes on dev
| would break changes on my branch (not a code merging way, but in
| a _complex adaptive system_ way).
|
| I responded to this risk by making a meeting. I presented our
| project schedule, and told my colleagues about their
| _expectations_ , I.e. if they drop code style comments on the PRs
| they will be deferred to a future PR (and then ignored and never
| done).
|
| What we needed _is_ fine grained testing with better isolation
| between components. The problem is is that our management is at a
| high level, they don't see meetings as a means to an end, they
| see meetings as a worthy goal in and of itself self to achieve.
| More meetings means more collaboration, means good. I'd love to
| see advice on how to lead technical changes with non-technical
| management.
| lifeisstillgood wrote:
| I am trying to expound a concept I call "software literacy" -
| where a business can be run via code just as much as today a
| company can be run by English words (policy documents, emails
| etc).
|
| This leads to a few corollaries - things like "If GPUs do the
| work then coders are the new managers" or we need whole-org-test-
| rigs to be clear about the impacts of chnages.
|
| This seems directly related to this excellent article - to my
| mind if all the decision makers are not looking at the code as
| the first class object in a chnage process (is opposed to Jiras
| or project plans) then not all decision makers are (software)
| literate - and this comes up a lot in the threads here ("how do I
| discuss with non-technical management") - the answer is you
| cannot - that management must be changed. This is an enormous
| generational road block that I thought was a problem thirty years
| ago but naively assumed would disappear as coders grew up. Of
| course the problem is that to "run" a company one does not need
| to code - so until not coding is something embarrassing like not
| writing is for a newspaper editor we won't get past it.
|
| The main point is that we need companies that can be run with the
| new set of self-reinforcing concepts - sops, testing, not
| meetings but systems as communication.
|
| I will try and rewrite this comment later - it needs work
| braza wrote:
| A marginally related point but I do not know if others faced the
| following situation: I worked in a place with a CI pipeline room
| ~25 minutes with the unit/integration tests (3000+) taking 18
| minutes.
|
| When something happens in production we ended up placing more
| tests; and of course when things goes south at least 50 minutes
| were necessary to recover.
|
| After a lot of consideration we decided to focus on the recovery
| and relax and simply some tests and focus on recovery (i.e. have
| the full thing in less than 5 minutes) combined with a canary as
| deployment strategy (instead rolling updates).
|
| At least for us was a so refreshing experience but sounded wrong
| in some ways.
| wussboy wrote:
| I've often said that it is the speed of deployment that
| matters. If it takes you 50 minutes to deploy, it takes you 50
| minutes to fix a problem. If it takes you 50 seconds to deploy,
| it takes you 50 seconds to fix a problem.
|
| Of course all kinds of things are rolled up in that speed to
| deploy, but almost all of them are good.
| tpoacher wrote:
| Meetings (used right) are a great tool, in the same sense that
| project planners (used right) are a great tool.
|
| But then there's Jira.
|
| /s
___________________________________________________________________
(page generated 2024-12-22 23:00 UTC)