hngopher.com

       [HN Gopher] Ask HN: How do you balance support and sprint tickets?
       ___________________________________________________________________
        
       Ask HN: How do you balance support and sprint tickets?
        
       Our DevOps/SRE team has been trying to do Scrum for over a year but
       the sprints are always derailed by support work.  They've tried to
       add support tickets to the sprint as they come in, but that changes
       the sprint scope and renders the "velocity" metrics useless.
       They've tried to exclude support tickets from the sprint and keep
       the scope very small. Some weeks it works, other weeks they
       overdeliver by a lot (because there were fewer support tickets and
       they got a lot of tickets from the backlog).  I'm thinking Scrum is
       useless for that team but management insists all engineering teams
       must use scrum.  Do you have any advice for managing tasks in a
       team that has support/adhoc and project tickets?
        
       Author : throwawayops123
       Score  : 102 points
       Date   : 2023-01-23 14:15 UTC (8 hours ago)
        
       | taubek wrote:
       | Is it an option not to introduce support tickets into sprints?
        
       | throwawaysleep wrote:
       | Go through the motions of Scrum and just reduce the amount of
       | stuff in the sprint to compensate for the lack of work. If they
       | are looking for points consistency, fiddle with the estimates to
       | get the right numbers.
       | 
       | Scrum is an accounting tool. Apply all the shenanigans that one
       | might find in financial reports to it.
       | 
       | At my first job, we just edited estimates after the fact to hit
       | the velocity target management wanted.
        
       | djbusby wrote:
       | Support Tickets are like fires. Typically, there are folk waiting
       | around in case of fire to render aid.
       | 
       | Rotate who the "fireman" is on the team, they are out of Sprint
       | and just fix. Spreads internal knowledge around too.
        
       | code_runner wrote:
       | TLDR: support should absolutely tank velocity, make the org feel
       | your pain. If they want better velocity they need to invest in
       | shoring up issues.
       | 
       | One thing to consider... support tanking your velocity is a
       | feature, not a bug.
       | 
       | Sounds like you need investment in support tooling etc that can
       | lessen that burden. Velocity will improve when support is less
       | burdensome.
       | 
       | This is obviously not one-size-fits-all advice but worked well
       | for a previous org I was at. If the sprint work will have a side
       | effect of fixing support or support is so intense it requires a
       | majority of the team you HAVE to fix that first regardless if
       | sprint/scrum/kanban, etc. if velocity was super high for a few
       | sprints but regressions/bugs were introduced they could impact
       | future velocity, so those high velocity sprints weren't as
       | productive as you thought. It's never just one number.
       | 
       | The org I worked at with the absolute worst support/sprint
       | structure also had the only code base I considered
       | "unsalvageable". They refused to do anything to improve existing
       | processes and spent half of every team's time on the same support
       | issues on an endless loop. They never had the measurements needed
       | to actually figure out where to improve clients experiences.
        
       | tjpnz wrote:
       | I'm a dev on an MLOps team, we also do Scrum like you. We keep
       | support tickets out of the sprint and also have a weekly rotation
       | for support work. It's far from perfect, some weeks are just
       | plain rotten and someone has to go off their ticket work to help.
       | But it has worked well enough for us in the year since we
       | introduced it.
        
         | stevekemp wrote:
         | That's how I've seen it done in a few places:
         | 
         | * The team does short sprints.
         | 
         | * One lucky soul handles all support requests for the duration
         | of the sprint, having no officially planned work for the
         | sprint.
         | 
         | The staff take it in turns to be the "support-person", and if
         | they have a lot of tickets they work a lot. If they have few
         | then they might sneakily do some work, or help colleagues with
         | their own stories/tickets.
        
           | maayank wrote:
           | > * The team does short sprints.
           | 
           | How short is short?
        
             | stevekemp wrote:
             | Sorry! Two weeks.
        
         | tecleandor wrote:
         | More or less like us, yep.
        
       | nradov wrote:
       | Split the team and have a few members dedicated to support
       | tickets on a rotating basis. Or just abandon Scrum sprint
       | planning cycles and run Kanban. You may have to escalate the
       | issue with management. In most organizations if you go up high
       | enough you'll eventually find someone willing to make an
       | exception based on proper justification.
        
       | SkyPuncher wrote:
       | Velocity metrics are mostly useless.
       | 
       | We rotated support engineer among our team members. When they're
       | on support duty, we just assign them half as many sprint points.
       | If they have a light support week, they take on extra work. A
       | nice bonus for our team.
        
       | pengo wrote:
       | In our environment support trumps sprint every time, regardless
       | of whether the support task is legitimate or not. By legitimate,
       | I mean that many of the "support tasks" are things clients can do
       | for themselves, but rather than point the client to the
       | (excellent) documentation, the development team ends up carrying
       | the baby.
        
       | aik wrote:
       | To deal with this we have multiple devsupport devs that sit with
       | support and handle urgent issues, anything larger/multi-day goes
       | into the sprint queue/backlog to be triaged for the next sprint.
        
       | LambdaComplex wrote:
       | > management insists all engineering teams must use scrum
       | 
       | You could always point out that the Agile Manifesto specifically
       | says "Individuals and interactions over processes and tools," and
       | therefore you aren't doing real capital-A-Agile if teams aren't
       | empowered to use the processes that work best for them. (It
       | probably wouldn't do you any good, but it might feel good to say)
       | 
       | Why does management insist all engineering teams use scrum,
       | though? I hope they're not trying to compare different teams'
       | story points, because that would imply that management has a
       | gross misunderstanding of how story points are supposed to work.
        
       | 300bps wrote:
       | Most doctor's offices set aside x% of time for scheduled visits
       | and y% of time for acute visits each day.
       | 
       | We do it similarly. We generally reserve 20% of our time for
       | production support and 80% of time for new development. If we
       | have fewer production issues that week, we get more new
       | development done. If we have more production issues that week,
       | get get less new development done.
       | 
       | We have one generic production issue story where we put small
       | things and we'll create a fully fleshed out story for larger
       | production issues.
       | 
       | If we have a trend of having too many production issues then we
       | raise that as an issue itself and get to the bottom of what is
       | causing it.
        
       | [deleted]
        
       | raptorraver wrote:
       | We have designated support person which changes weekly. He is the
       | first line of contact in our team and reacts to support tickets.
       | Of course he can't resolve them all by himself but he delegates
       | them further if help is needed. He doesn't even try to do any
       | sprint tickets. This arrangement has so far worked quite well.
        
       | OJFord wrote:
       | > I'm thinking Scrum is useless for that team but management
       | insists all engineering teams must use scrum.
       | 
       | All 'support' created tickets go into the backlog then, earliest
       | they can possibly be addressed is next sprint. (Assuming you want
       | the rule to change/it be someone else's problem/proposed
       | solution.)
       | 
       | As a bit of an aside, I'm a bit curious about your role/the org
       | structure that you seem to have oversight over SRE as well as
       | other teams' working processes, but also have 'management'
       | imposing roughly what they look like from above?
        
       | ddalex wrote:
       | I allocate 30% of the working time to support, 60% to development
       | time, and 10% slack as in reserved for unexpected contingencies.
        
       | jrowdy wrote:
       | First, it's right to expect a mix of new dev with
       | support/maintenance tasks, and you're not alone. Second, if
       | you're overwhelmed by support work, then I'd ask why?
       | 
       | We run scrum with 2 week sprint cadence and mix new dev with
       | support/maintenance. The allocation differs based on season and
       | events. Any support tickets we do don't count towards velocity
       | and are mixed in with story point tasks and other
       | maintenance/debt tasks. We track velocity as points delivered to
       | a client, which helps with estimation for delivery of products /
       | features to clients. This velocity is an average, and that's when
       | the numbers work in your favor as "all estimates are wrong."
       | 
       | In a scenario where velocity is constantly decreased by
       | support/maintenance/bugs, that tells a story, either of quality
       | of work done previously, impatient management, or lack of
       | discipline by the team. If you can't go a sprint without having
       | to put out fires, that's indicative of a system that should be
       | mitigated on a larger scale than continually applying bandaids,
       | otherwise you'll constantly be bit.
       | 
       | In my mind, your choices are: change your cadence/strategy,
       | change your values/philosophy, change management style, change
       | your budget/velocity, invest heavily now to replace troubled
       | systems, invest in man power to put out fires, or simply stay the
       | course with a shift in perspective.
       | 
       | Only you can choose.
        
       | swagtricker wrote:
       | Scrum is training wheels. The time boxes are supposed to make you
       | reflect and improve. Ditch Scrum. Go to Kanban, scope your
       | stories smaller and prioritize bug fixes over all new work until
       | you're not writing bugs. Can't ship code without writing bugs?
       | Improve your skills as a team. Start pair or mob programming. Do
       | TDD to improve your design, testability & maintainability of your
       | code. Start moving towards trunk based development and use
       | feature flags to do REAL CI/CD (hint: using a build server
       | doesn't mean you're doing CI). Code deployment != feature
       | deployment! Strive to "roll forward" your code as much as
       | possible and disable/enable features if you run into problems -
       | learn to not "rollback".
       | 
       | Now of course, since this is HN somebody's going to sneer
       | divisively at everything I just said and tell me it's not
       | possible (despite the fact that I've done this, repeatedly in
       | different organizations in a developer and coaching capacity for
       | almost two decades now). Here's my preemptive caveat/STFU for
       | detractors: the above method only works if you, as developers,
       | have full control & ownership over your application code data,
       | and do your own deployments or are partnered strongly with an OPS
       | team that gives you full monitoring & Read Only access. If your
       | team is working in an environment where you don't actually get
       | ownership over your code and data, this won't work. If
       | management, architects, or egotistical prima donna staff/senior
       | developers "won't let you" pair/mob, do TDD, do trunk based
       | development, or do proper CI/CD, this won't work.
       | 
       | P.S. If you're in an environment where "this won't work" - QUIT!
       | Life's too short to put up with being expected to build software
       | with one hand tied behind your back. These things are often
       | easier to do in medium to small sized companies. These things are
       | often easier to do on greenfield (or at least recent) projects.
        
       | codingdave wrote:
       | Don't do Scrum when your work priorities changes more often than
       | your sprint length. If they want to do Agile, do Kanban. Or
       | Scrumban. Something where the process fully supports popping a
       | new ticket to the top of the priority stack.
       | 
       | If there is any one thing teams new to Agile need to learn, it is
       | that Scrum is just one choice among many. And nowhere near as
       | universally useful as people seem to believe.
        
         | rileymat2 wrote:
         | Your point cannot be overstated, I find scrum great for new
         | product work, but pointless for maintenance that comes in
         | ticket by ticket.
         | 
         | HOWEVER; The counterpoint would be if you are doing new product
         | work and your priorities are changing faster than a week or two
         | you have either an exceptional situation or this should be a
         | canary that you have something really wrong with your priority
         | directions.
         | 
         | In the latter case scrum is being a useful tool in bringing
         | this to your attention.
        
         | throwawaysleep wrote:
         | Scrum usually arrives the same way grain arrives in the
         | stomachs of fois gras ducks, so unfortunately I suspect the
         | team has no choice in the matter.
        
         | duringwork12 wrote:
         | Or do scrum with one week sprints. If the priorities are
         | changing daily that needs to reorganize the entire team. Maybe
         | its not the time to do feature work.
        
           | oweiler wrote:
           | Scrum with 1 week sprints mean you spend a considerable
           | amount of time in useless meetings (retro, planning 1 + 2,
           | and so on).
        
             | alfons_foobar wrote:
             | Not necessarily, those meetings should be much shorter if
             | you only plan for one week.
        
         | themadturk wrote:
         | When I was on a support team that was part of a development
         | team, the support people always used Kanban. We were fortunate
         | that management never had a problem with this. We still had
         | daily standups, just to keep in touch with each other, but we
         | did no sprints.
        
         | brightball wrote:
         | I've always found that Scrum is suited for 2 environments only:
         | 
         | 1. Pre-customer development
         | 
         | 2. Contract development shops where you bill by the sprint
         | 
         | I've never seen good outcomes in a live product environment.
         | 
         | Usually the answer is Kanban for smaller teams or Scaled Agile
         | for larger teams IMO...but the situation the OP is dealing with
         | looks like they are prioritizing preventative work to stop
         | these support tickets from having to come in in the first
         | place.
         | 
         | Prevention measures for unplanned work are critical.
        
           | wintogreen74 wrote:
           | >> or Scaled Agile for larger teams IMO
           | 
           | I STRONGLY disagree with this, conceptually, logically and in
           | practice. Scaled agile approaches make you plan your sprints
           | way further in advance and account for emerging (or
           | sustainment) work with the use of reserving a percentage of
           | capacity. This works about as well as riding out a run of bad
           | luck at the casino while you wait for the odds to balance
           | out. Scaling "up" the existing practices to a department or
           | company level will not help an interrupt-driven team. Scaled
           | agile doesn't work for pure development teams; i'ts even less
           | likely to work for an SRE team.
        
             | bradwood wrote:
             | LeSS beats Scaled Agile hands down IMHO
        
               | brightball wrote:
               | Reading about LeSS, it looks like it's basically the same
               | thing? As long as you're doing PI Planning you're a step
               | above virtually every other org out there. It looks like
               | it could be a good fit for some groups I know who aren't
               | quite big enough for SAFe though, while still drilling in
               | the same core ideas.
               | 
               | IMO the biggest issue with Scaled Agile always comes down
               | to leadership IMO. It's not supposed to be rigid. It's
               | supposed to be adapted to your organization and put the
               | plans in the hands of the developers.
               | 
               | The horror stories about SAFe that I've seen on here
               | usually end up with me asking a few followup questions
               | and those questions make it clear that something was
               | terribly wrong. One guy on here told me they did PI
               | Planning for 2 full weeks, which is abject insanity.
        
             | brightball wrote:
             | Again, this will depend on the size of the team. Planning
             | on a longer time window while giving dev teams a time to
             | coordinate, get everybody on the same page about what
             | you're doing and why...works. It works extremely well.
             | 
             | It gets the plan in front of everybody so you can discuss
             | the tradeoffs and risks too. There's an entire open session
             | as part of it called R.O.A.M. where risks to the ability to
             | deliver the plan are discussed, not just with the tech
             | people but with the business folks too.
             | 
             | That portion of PI planning is where you call out things
             | that could break the plan. Emerging (or sustainment) work
             | is one of those risks and you have to call it out, then
             | once it's called out you have to discuss what the
             | organization is going to do to prevent it. Capacity isn't
             | going to just spring forth out of nowhere, so what are we
             | pushing off if there's more of it than expected? Do we need
             | to prioritize preventative work to make sure we don't have
             | problems? Do we need some type of infrastructure,
             | monitoring system or automation in our devops process?
             | 
             | Everywhere I've been, the PI Planning process has led to
             | less "we have to do this now" and more "we need to see if
             | this can be included in the next PI", which creates
             | significantly less disruptions to the development team
             | focus. The only remaining interruptions should be
             | production issues (preventable) or significant market
             | shifts (rare).
             | 
             | It doesn't work when people try to adhere to that plan so
             | rigidly that it becomes a waterfall system.
        
           | bobkazamakis wrote:
           | "Scaled Agile" is another way of saying Waterfall but
           | allowing you to expense 20% of your team overhead for scrum
           | masters.
        
         | Juliate wrote:
         | This. Scrum is not adequate for this.
         | 
         | What we used to do (while growing the SRE team from ~8 to 30)
         | was having dedicated teams on specific projects. Each had their
         | own scheduling system as they saw fit. AND a weekly rotating
         | support duty which took precedence: teams had to allocate
         | someone from their own capacity to this duty.
         | 
         | The importance was to adequately rotate everyone, to pair a
         | senior and a junior in the support team, and to be available to
         | them to debrief, or to reallocate high priority/criticity tasks
         | to more expert members, if required.
         | 
         | Also, if the support queue was empty, they could do
         | maintenance/exploratory work, but NOT their own team project
         | work.
        
       | mike503 wrote:
       | I've found it to be a square peg and a round hole. In every
       | organization I've worked with. I'm always in a "support" team,
       | which is more of a horizontal capability. Something akin to
       | legal, accounting, etc. We exist to support vertical teams that
       | can align with SAFe/scrum stuff.
       | 
       | We should only have scrum tickets tied to us that require sprint-
       | specific work from a timing perspective. Otherwise, it's
       | retroactive "tracking" points later on or preallocating points
       | ahead of time for "support work" which is absolutely pointless.
        
       | detaro wrote:
       | I've never seen the velocity metrics be not useless, so I'd
       | sacrifice those. Or people need to realize that yes, a team that
       | also does support will have widely varying progress towards
       | project goals depending on support work load, that's not a
       | problem. Who really cares if a team "overdelivers" because they
       | got less interruptions than expected?
        
       | jrochkind1 wrote:
       | > Some weeks it works, other weeks they overdeliver by a lot
       | (because there were fewer support tickets and they got a lot of
       | tickets from the backlog).
       | 
       | OK, this is a problem because why? Some weeks it works, and other
       | weeks they accomplish _more_ than expected... and this is a
       | problem why??
       | 
       | Like, what's the actual problem here, if good progress is being
       | made on features and bugfixes?
       | 
       | My guess it's a problem if management is trying to use "scrum" to
       | treat developers like sweatshop workers who they wring the most
       | possible production out of. How can we know what the most
       | possible production is, if they have a different amount of time
       | to work on it every week?!?
       | 
       | We could come up with different arithmetic ways to solve this.
       | Assign "points" to support tickets too (even after the fact, how
       | much time they actually took?) to include them in your
       | "velocity". Or keep track of how much time is being spent on
       | support to, to "normalize" the velocity of the other stuff
       | accordingly (different way of doing the same thing).
       | 
       | But personally I do not really want to help companies become
       | better at treating developers like sweatshop workers to wring
       | maximal productivity out of. I'd rather realign to "agile" it was
       | originally intended, to empower developers to bring value, not to
       | treat them as commodified interchangeable widgets to exploit and
       | burn out.
       | 
       | but, I mean, exploitive companies gonna company I guess.
       | 
       | > I'm thinking Scrum is useless for that team but management
       | insists all engineering teams must use scrum.
       | 
       | Which they also insist means relying on those "velocity" metrics?
       | Can you do "scrum" without "velocity" at all?
        
       | otikik wrote:
       | The "cover your ass" strategy involves tracking 3 things:
       | 
       | * Estimation time to complete each project task (hopefully
       | estimated by the person who is going to do the job. Then add 20%)
       | 
       | * Status of each project task (from 0% to 100%)
       | 
       | * Time devoted to "support interruptions" each week. Doesn't need
       | to be super detailed - something like "week 4: 5 man-days on
       | support tickets x1, y4 and z6"
       | 
       | That way at the very least you are equipped to answer the
       | inevitable questions "why isn't feature C done?". The answer
       | would be something like "we still need around 2 more weeks of
       | full-time work in order to complete that task, but given the
       | current rate of interruptions we will realistically have it ready
       | in one month".
        
       | markus_zhang wrote:
       | We have an oncall rotate in which whoever is oncall is not
       | expected to work on anything else.
        
       | sandreas wrote:
       | My advice is to first define the importance of support and how
       | many tickets have to be solved within a specific time period in %
       | (with your manager). Then define a part of the team that works on
       | tickets for one sprint (alternating). This worked pretty well for
       | me.
       | 
       | Example:                 Your team has 6 members       Your
       | management would like to put 15% of the workload for support
       | Every sprint 1 alternating team member is doing just support
       | tickets       You keep track of the support days of every member
       | to equalize the workload (support is no fun)       Because
       | working alone is not the best situation you could work with 2
       | members every two sprints, but keep in mind that this means NO
       | support ticket is solved for 1 sprint
       | 
       | This is the best solution I could find being forced to practise
       | SCRUM - even if it does not match the rules exactly.
        
       | roberthahn wrote:
       | I see a few people advocating for putting one person on support
       | each sprint, rotating through the team.
       | 
       | This works fine for small teams but for larger (and with large,
       | older codebases), the ramp-up time for being an effective support
       | dev becomes a drag because it can be months between support
       | stints. We don't all have eidetic memories and forget the tricks
       | we use to diagnose production issues.
       | 
       | To counteract that we've recently put a developer on a 6 month
       | rotation, and support them with a rotating backup. This allows
       | the primary support developer to not only stay productive fixing
       | issues but also surface intelligence around the problem areas in
       | our app (ie: what's always needing support) and construct tools
       | to make resolution easier or better yet convert to a self-serve.
       | 
       | I infer from your comment that you're a smaller team so perhaps
       | you won't need to put someone on a long stint, but you might want
       | to consider doing so anyway in order to have a resource pave the
       | proverbial cow-paths and make it easier for the team in the
       | future.
        
       | comprev wrote:
       | I've worked on a team which rotated the "daily support" tasks,
       | which included on-call duty during those 7 days.
       | 
       | They were in effect excluded from the sprint during that week.
       | 
       | It was also a company who was trying to shoehorn everything into
       | SCRUM because a consultant told them.
        
       | ineptech wrote:
       | Just stop making sprint commitments. If your priorities change
       | from day to day, what use is a list of priorities from two weeks
       | ago?
        
       | grepLeigh wrote:
       | What's an example of a support ticket that might come in? Do you
       | have a priority matrix for support tickets?
       | 
       | I find it helpful to rank support tasks by:
       | 
       | * Likelihood. Does this issue impact 10%, 25%, 50%, or 100% of
       | all users?
       | 
       | * Pain. Where does this issue rank on a scale of 1 (minor
       | nuisance) to 4 (product usage is impossible)
       | 
       | Adding categories about the "type" of work is also helpful. This
       | lets you stay current on "crash" support issues while also giving
       | you the ability to lesser-priority tasks for dedicated
       | "localization sprints."
       | 
       | Check out excellent article for more examples:
       | https://lostgarden.home.blog/2008/05/20/improving-bug-triage...
       | 
       | For some DevOps teams, I've found that "support ticket" is
       | equivalent to "hold someone's hand while doing ____." This
       | happens because DevOps people tend to be jack-of-all-trades and
       | know their way around a wide variety of systems. If you're
       | swamped by these kinds of tasks, start collecting metrics about
       | the types of tasks coming in and build a self-service knowledge-
       | base. Another common way to deal with this situation is dedicate
       | 1 person to "triage" each sprint cycle. This person's workload is
       | expected to be 100% support/routing.
        
         | grepLeigh wrote:
         | The whole point of this labeling task, by the way, is to give
         | your team the evidence to say "no" and "not right now" to
         | incoming work.
         | 
         | If there are true emergencies / outages derailing your work, it
         | sounds like the escalation path should be an on-call pager (not
         | support tickets).
        
       | mbostleman wrote:
       | Like many others here, rather than answering how to make it work,
       | I would say the premise is not valid - scrum isn't the right tool
       | for devops. Probably something more akin to Kanban should be
       | looked at.
        
       | 4ndrewl wrote:
       | By value. Don't overthink it, don't arbitrarily call them 'bugs',
       | 'support', 'features' etc.
       | 
       | Just do the most valuable thing first. Ship. Repeat.
        
       | rileymat2 wrote:
       | Isn't the classic answer to take urgent tickets and allow that to
       | show up in decreased velocity?
       | 
       | I have never worked anywhere, including using scrum, where I
       | could avoid incoming issues for 2 weeks at a time.
       | 
       | At the end of the day, better systems, better code, less
       | technical debt means less support. Issues with previous work
       | might indicate a false velocity of the past, releasing low
       | quality.
       | 
       | There are long term maintenance exceptions that happen to all
       | code bases, that have nothing to do with quality, like supporting
       | new platforms or unpredictable platform changes, those can go in
       | as backlog tasks.
       | 
       | The real question is what you are using "velocity" for? Is it to
       | help get a feel for work you can complete, or is it a metric to
       | be judged by management to evaluate you?
        
         | gedy wrote:
         | > Isn't the classic answer to take urgent tickets and allow
         | that to show up in decreased velocity?
         | 
         | It should, but I've been surprised by some companies
         | interpretation of "product velocity" to just mean "how many
         | tickets does each developer complete per 'sprint'".
         | 
         | That's _not_ product velocity, that 's just some
         | micromanagement nonsense so they can measure/push devs. No
         | wonder many devs hate "agile" like this.
        
       | LanceH wrote:
       | You triage which will get done and your velocity goes down if you
       | have a lot of support tickets. This will make your velocity a
       | meaningful metric over time, reflecting your rate of production
       | toward new features.
       | 
       | Orrrr.....you assign points to the support work as well, your
       | team hits their 50 points every week or else they get hit over
       | the head by management as to why they weren't on track.
        
         | deathanatos wrote:
         | It sounds like OP is doing this. The point would be that even
         | doing that, you're not making progress on team goals. (Your
         | goals are instead being set by the interrupting tickets.) There
         | might be 200 pts/wk or whatever, but it isn't going to be going
         | into the project your PM is wanting it to.
        
       | orev wrote:
       | > management insists all engineering teams must use scrum.
       | 
       | DevOps/SRE are actually System Admins (always have been, always
       | will be), and they are NOT engineers. They are operations. You
       | cannot run Operations using scrum/agile.
        
       | tartieret wrote:
       | We leave on person on "support duty" outside of the sprint. One
       | of my friend working at Amazon in a team of 8 did the same with 2
       | persons out of the sprint (one being "on call" and dealing with
       | support & operational issues, the other one working on
       | "improvements" voted by the dev team, like better tooling, better
       | monitoring...)
       | 
       | On my side, working in a small team of 5, we just leave one
       | person out of the sprint to work on support requests, bug fixes
       | and if time allow improvements. Large handling of technical debt
       | (ex: transitioning from VueJS 2 to 3, good luck expressing this
       | as a "product increment" in a sprint goal and good luck if you
       | don't tackle it ... ) are part of sprint work.
       | 
       | Overtime, I realized that Scrum is a framework, not a fixed set
       | of rules, and as per Agile, the goal is to maximize business
       | value. So if we are swamped by support requests and our systems
       | are not operating properly, it's the role of the person on
       | "support duty" to warn the rest of the team and ask for help.
       | Sure it may impact some velocity metrics and we may not hit our
       | sprint goal that week, but why would that matter? There is no
       | point writing more code and building more features if what we
       | have is not working properly and not satisfying to users.
       | 
       | The main issue i have seen with this system is that the developer
       | on support duty may not have the right skills to perform all the
       | work that is needed. Scrum relies on the assumption that anyone
       | can do anything, and that's definitely a flaw when working on a
       | larger codebase with a mixed of frontend, backend and devops with
       | more junior developers. It does force us to train everybody and
       | learn to delegate so that everybody can be exposed to all parts
       | of the system but this takes many months / years.
       | 
       | We also learnt to define smaller Scrum goals. It's better to
       | achieve a small goal and then decide what to do next than to
       | always feel like we are running behind. The sequence of "two
       | weeks" sprints is often seen as a "fixed" schedule with deadline
       | to hold at all costs, but that's stupid. The core idea of Agile
       | is to follow an incremental approach following the path of max
       | business value delivery, and periodically reflect about how
       | things are going. If the amount of support requests and
       | operational issues is such that there is no more resources
       | available for new feature development, then it's time to
       | prioritize tackling some technical debt and improve monitoring
       | and automation to prevent the most common issues
        
       | tecleandor wrote:
       | We designate a support person every week (like a rotating
       | position) and don't count with their dedication (or just a small
       | percentage) for the sprint.
       | 
       | This also helps avoiding constant interruptions or context
       | switching to the whole team that are now handled by the
       | designated person of the week.
        
         | Phelinofist wrote:
         | That requires (mostly) homogeneous knowledge of everyone in the
         | team though
        
           | rednerrus wrote:
           | It encourages it as well.
        
         | Hermitian909 wrote:
         | What's really great about this system if you're consistently
         | seeing more support work than the designated support person can
         | handle it's a highly legible signal to management that support
         | work for your team requires >1 full time engineer. This makes
         | it significantly easier to advocate for paying down tech debt
         | or creating self-work workflows for people filing tickets.
         | 
         | As an added bonus, this incentivizes better documentation of
         | your system via runbooks (I'd rather point someone at a doc
         | than walk them through a problem when it's not my week).
        
           | atomicnumber3 wrote:
           | The breakdown of this though is that sometimes management
           | thinks "oh all (internal, I assume?) support is solved with
           | the cost of only 1 headcount" and then they just let that
           | rotating week become more and more hellish. Then never
           | prioritize anything to make it better because the maximum
           | "impact" of that work is taking the footprint from "1 eng per
           | week" to "1 eng per week" (zero perceived "real" impact. And
           | if you provide alternate metrics to correctly show the load,
           | they'll just say, well clearly it still fits into 1 eng week,
           | good job, keep it up).
           | 
           | And eventually employees burn out and start mailing in that
           | week and now suddenly nobody cares except the other teams
           | having trouble getting support, but apparently that is a
           | diffuse enough complaint that nothing ever came of it.
           | 
           | Not that I speak from experience...
        
             | Hermitian909 wrote:
             | All systems fail in the face of sufficiently bad
             | management, at which point your only recourse it to
             | convince someone higher up the chain that there's a
             | problem. Legibility can make convincing people easier, but
             | nothing is fool proof.
        
             | IanCal wrote:
             | I'm not sure I understand - you'd have a constantly growing
             | backlog of support tickets and the time for things to be
             | resolved would grow dramatically. More and more people
             | internally would be complaining nothing was getting sorted.
        
               | kqr wrote:
               | Under bad management the growing backlog is translated
               | into ever more overtime and hurried low-quality band-
               | aids.
        
             | kqr wrote:
             | The problem here is bad management. If it wasn't for this
             | failure mode they would find another way to mess people up.
             | No process works well under bad management, except perhaps
             | some variant of anarchy/direct democracy.
        
         | trynewideas wrote:
         | Do you track the designated support person's work? If so, what
         | does that look like?
        
           | kayodelycaon wrote:
           | Not OP, but my team uses tickets to track support work. We
           | use JIRA. We have a service project (SWS) and a software
           | project (SWE). If someone is working on support, you'll see
           | their names on SWS tickets.
        
         | maayank wrote:
         | How many people in your team?
        
           | Snoddas wrote:
           | We do the same in a 6 member team
        
           | tecleandor wrote:
           | 6 now. 10 a year ago.
        
           | ajford wrote:
           | We do the same with a 3-person team. The on-call person
           | spends one week of the two-week sprint focused on support,
           | and usually picks up a few smaller tasks/backlog items to
           | fill in the time if it's a slow support week.
           | 
           | That works out to leaving one team member a full sprint for a
           | deeper heads-down task.
           | 
           | To alleviate some of the extra load on the on-call rotation,
           | we dedicated a decent portion of our sprints to automation
           | for a few quarters, which allowed us to reduce some of the
           | recurring manual tasks and ease the load on the on-call role.
           | Now the on-call person will usually see an average 15hrs of
           | support on the regular (of a 40hr week), down from damn near
           | 35hrs.
           | 
           | That is to say there was about 20hrs of automatable tasks
           | that had been shouldered by one team member for years. When
           | that person left the tasks were placed onto the on-call
           | rotation as a duties as a way to ensure every team member
           | became familiar with these tasks instead of being siloed into
           | a single person (there was a scramble to learn those tasks as
           | this person was offboarded, and a rough couple of months as
           | we skilled up on these).
        
       | ernest0r wrote:
       | In my previous team we used a single kanban board for dev and
       | support. That worked well, and did not require much of effort to
       | communicate possible delays.
        
         | VidasV wrote:
         | We also use single Kanban board for all current tasks. They get
         | pulled fron support and backlog. Similar idea is described
         | here: https://teamhood.com/kanban/kanban-for-software-
         | development/
        
       | theptip wrote:
       | I agree with sibling comments saying Scrum isn't optimal for
       | interrupt-dominated work like SRE. Kanban or Scrumban are good
       | candidates.
       | 
       | Even for normal dev teams with customer-facing products (and
       | therefore no zero support load) I have found a need to carve out
       | some time for support work from the Scrum team velocity.
       | 
       | As others have suggested, having a dedicated engineer as primary
       | support contact works well. You can figure out what the 95th %ile
       | support burden is and remove that from the velocity. If you have
       | a quiet week of support, the expectation is you spend the support
       | budget on tools and docs, maybe improve automation on some
       | existing scripts, knock out lower-priority incident remediation
       | WIBNIs, etc.
       | 
       | If you have a >95%ile bad week, then the support engineer does
       | less sprint work or the rest of the team pitches in as needed.
       | But most of the time support doesn't impact your velocity with
       | this approach.
        
       | fd111 wrote:
       | A couple of times in my career, I've had the extreme pleasure of
       | working as a _dedicated_ support /backstop engineer within a
       | small-ish (~20 person) development group buried deep inside a
       | giant multinational corp. (I'm one of those seemingly rare
       | weirdos who strongly prefers troubleshooting, bug-fixing, etc.)
       | 
       | My priorities were inverted: customer escalations first, and if
       | you have time left over, then go work on bug backlogs that no one
       | else wants to address. Everyone else sprinted while I bumped
       | along at my own pace -- subject to escalation priorities, of
       | course.
       | 
       | Those were dream jobs for me, and the rest of the teams seemed to
       | appreciate having someone -- anyone, just not them! -- dedicated
       | to the support role.
       | 
       | Is it just my imagination that this kind of (I would say
       | "enlightened") management/organization is rare in the industry at
       | large? Or do lots of dev teams do this sort of thing? And where
       | can I find them? :-)
        
       | [deleted]
        
       | jedberg wrote:
       | Scrum is useless for that team, you are right. You can't do scrum
       | if more than 1/2 your work is ad hoc support tickets.
       | 
       | When I ran SRE at Netflix, our general trick was one person is on
       | support for the week. They were the primary on-call, they handled
       | any emergency support. Everyone else was primarily building, or
       | at least doing longer-term follow up work from support tickets.
       | 
       | In major emergencies of course everyone would drop what they were
       | doing and help, or if a support ticket came through for a tool
       | someone else built, it was fine to ask them for help. But at
       | least the mindset was "jedberg is on call this week, so don't
       | expect him to be building anything this week".
       | 
       | We also had good management support so that when we went to
       | review our quarterly goals, and 75% of them were red, they
       | trusted us to know that it was because we had a lot of support
       | load that month. Everyone who depended on our tools knew that
       | they would be ready when they are ready.
        
       | timmahoney wrote:
       | DevOps lead here; technically we have 2 week sprints, but we use
       | kanban and basically add support tickets for each item that the
       | team needs to work on during the sprint. This means that
       | sometimes our "burndown" is more of a level line, but when that
       | happens my manager understands, and we either determine the type
       | of work that would reduce it or discuss adding more capacity. For
       | tracking types of support tickets that come in, we try to
       | determine which ones we can automate or heal completely so that
       | those types of tickets dry up. Those items get roadmapped and
       | worked down... eventually. It works OK, not perfect but I haven't
       | found a better method so far, and everyone's happy with my team's
       | work.
        
       | jasonlotito wrote:
       | Just do kanban. It's made for this.
       | 
       | > management insists all engineering teams must use scrum.
       | 
       | They aren't doing scrum. At the very least, just "do scrum"
       | externally and internally do kanban. Management is clearly dumb
       | enough to believe it.
        
       | dboreham wrote:
       | You have discovered a couple of things:
       | 
       | Scrum doesn't survive "reality testing" in the context of actual
       | s/w development projects.
       | 
       | Management (at least in your case) doesn't care about reality.
       | 
       | The usual solution is to participate in "software process
       | theater" where you have the meetings and maintain the project
       | plans for scrum, but actually use an alternative reality-driven
       | management process to run your team.
       | 
       | Not perfect, but it's how software development has been done in
       | at least 1/2 the places I've worked for the past 30 years. Before
       | Scrum there were other crazy things, and there will be more crazy
       | things to come. It's just the nature of humans + the endeavor of
       | developing software.
        
       | raldi wrote:
       | Step one is to have Product/Sales define an SLO for support
       | ticket resolution (and for all your key services, if you haven't
       | already). Give more sprint time to whatever's falling behind.
       | 
       | If both are falling behind, you need to slow down the velocity of
       | which you're demanding features be shipped, or loosen the SLOs,
       | or hire more programmers.
        
       | rado_stankov wrote:
       | Every sprint we have 1 person in rotation for bug duty. It works
       | quite well. Overtime role takes some refactoring and tech depth
       | as part of bug duty.
       | 
       | I have a blog post on the subject a while back:
       | https://blog.rstankov.com/bug-duty-process/
        
         | code_runner wrote:
         | The only sane approach. If someone is on support they also work
         | to lessen the support load permanently. Recurring issues with
         | no root cause analysis and fixes are deadly to a team.
        
         | kevincox wrote:
         | I think this is what you are saying but to reword it for how we
         | do it.
         | 
         | We have one "on duty" person each week/half-week/two-weeks.
         | Their priorities are:
         | 
         | 1. Support tickets.
         | 
         | 2. Whatever they want (usually code cleanup).
         | 
         | They aren't expected to get any project work done. We also
         | track the amount of time that they are spending on support
         | tickets. If it is getting high we move more cleanup tasks into
         | the regular project work cycle.
         | 
         | If you have oncall you can often co-locate these as well
         | (oncall is priority 0).
         | 
         | The main risk is that if your support load gets high than there
         | is no room for cleanup/automation which can become a feedback
         | loop causing more support load. I haven't seen a good solution
         | for this other than keeping track of the load and scheduling
         | proper "project work" to address it when it gets too high.
        
       | freedomben wrote:
       | I've been working on and/or managing both dev teams and
       | devops/ops teams, and it is my very strong opinion that _Scrum is
       | terrible for devops /ops teams_. No managers (besides me) want to
       | hear it, but it's true. It provides the wrong incentives to the
       | team and leads to pain and sadness all around, and eventually
       | production outages and employee turnover. The Scrum game
       | (heavily) incentivizes delaying support work at least until the
       | next sprint, which frustrates the hell out of both teams. If
       | you're doing the whole roadmap thing with goals and stuff, it's
       | even worse because then the incentive is to either #WONTFIX or
       | half-ass it or somethign else to make it go away so you don't
       | attract the eye of Sauron from the spreadsheet overseers by
       | slipping on your "already pretty aggressive" goals.
       | 
       | What to use then? Kanban, with little to no deadlines (estimates
       | are ok but be careful as it's very easy for an "estimate" to turn
       | into a "this work is late because we thought it would be done by
       | X.")
        
         | tmountain wrote:
         | Time to market tends to dominate most conversations around
         | planning, but as I look back on a 10 year stint that I recently
         | finished with a company, I am firmly convinced that this is the
         | wrong strategy.
         | 
         | Time and time again, I saw teams making compromises and taking
         | shortcuts to shave a week off of a project timeline--only to
         | have a crummy and bug laden release create negative first
         | impressions with our customers.
         | 
         | Many features were poorly validated for market fit and were
         | sunset just a year or two after release, as the maintenance
         | cost outweighed value.
         | 
         | My (admittedly biased) opinion is that the best way to deliver
         | value to a business is to spent a lot of time and energy
         | figuring out what your customers actually want, which is
         | usually just a few key things, and then take the time to build
         | those features the right way (less time pressure because the
         | surface area is smaller).
         | 
         | This lowers the ongoing maintenance cost, as the team has the
         | time necessary to do the work properly, and increases CSAT and
         | retention.
         | 
         | When it is time to build something new, the team isn't so
         | bogged down with maintenance and debt, so they can focus more
         | directly on the problem in front of them.
         | 
         | Sadly, it rarely seems to go this way.
        
           | kqr wrote:
           | > My (admittedly biased) opinion is that the best way to
           | deliver value to a business is to spent a lot of time and
           | energy figuring out what your customers actually want, which
           | is usually just a few key things, and then take the time to
           | build those features the right way (less time pressure
           | because the surface area is smaller).
           | 
           | I'm of the exact opposite opinion, but we agree on almost
           | everything, funnily enough!
           | 
           | I think the best way to deliver value is to quickly write
           | small, fast, rock-solid but relatively un-featured prototypes
           | and see in which direction production feedback indicates you
           | should evolve those.
           | 
           | The above is based on two things:
           | 
           | 1. You can't know what your customers want because not even
           | your customers know what they want. They only know what they
           | don't want once they see it. You can spend a lot of time an
           | money on research but the MVP is cheaper and a stronger
           | signal.
           | 
           | 2. The same idea you put out there: most things are bad
           | market fits and should be sunset fairly quickly. By building
           | small things, you optimise for this case. By also making them
           | reliable, you don't make it too painful to evolve them later
           | on. (Easier to add features to quality software than add
           | quality to featureful software.)
           | 
           | This, of course, also leads to less maintenance and higher
           | pace of innovation down the road!
        
             | arminiusreturns wrote:
             | This is why many of us ops types are annoyed at the current
             | popular fad of rejecting the unix philosophy of keeping
             | programs small and task oriented, but able to pipe and be
             | piped as part of a greater workflow putting those things
             | together.
             | 
             | This is exactly it. I see whole stacks, and theres seems to
             | be always a key app or microservice or two written by a
             | particular team, and it usually: has bad log formatting,
             | doesnt multi-thread or muli-task well (concurrency and
             | parallelism), have some non-standard or extranneous db or
             | rpc/api queries that impact entire flow with locking,
             | etc...
             | 
             | If the focus was to make sure the services were rock solid
             | and that was the focus it saves so much future pain and
             | tech debt digging.
             | 
             | On note of what the customers want: I think the approach I
             | prefer is to not spend much time on it, and make it fast
             | and intuitive - quick brainstorming brings rawer truths to
             | the surface often if done frankly.
             | 
             | Alas, all that said, one can't always manage upwards. The
             | c-suite and the boards are so often high on their own airs
             | they stop listening to engineers or have let the wrong ones
             | become filters, and there is nothing you can do to change
             | that unless you are on the board and can fight at that
             | level, but I don't know many engineers that can. This is
             | the main irony of SV to me.
        
         | marcosdumay wrote:
         | I'm still looking for something that Scrum isn't terrible for.
        
       | kodah wrote:
       | You need to use styles of managing work that reflect the nature
       | of the work and the responsibilities of the people doing the
       | work.
       | 
       | "Sprints" work okay for product work. I use them a lot less
       | rigidly than most people do with success. On-call activities
       | should be recorded in a kanban-style project, these things are
       | made for rapid reprioritization. Whoever works that board should
       | be dedicated to _only_ that board; making members of a team work
       | in multiple project management paradigms at the same time usually
       | has bad and confusing outcomes. Lastly, the  "support" tickets
       | can be varied. If they're requests that are gated by your team,
       | I'd add them to the on-call kanban board. If they're just
       | question/answer I wouldn't bother tracking them, you'll more or
       | less end up recreating ITIL.
       | 
       | Some additional context would be to dedicate people to these
       | boards for periods of time. Have a rotation that incorporates
       | everyone doing a set amount of on-call on a recurring schedule.
       | Build up a handoff procedure for the kanban board since context
       | will be fresh every week due to the nature of the work.
        
       | Galxeagle wrote:
       | From an individual perspective, we've really valued Google's SRE
       | whitepapers in interrupts and incident handling. The key message
       | is that setting up your team to enable bundling 'jumping on
       | things' and also 'focus delivery work' makes both people happier
       | and more productive than asking two people to do both.
       | https://sre.google/sre-book/dealing-with-interrupts/
       | 
       | From a PM perspective, that also lets management pre-allocate
       | capacity for the sprint. '2 people on prod support, and 3 people
       | on development/project work' is a decision that can be made and
       | pointed back to. And being allocated to 'jumping on things' means
       | you don't have to justify your productivity/velocity/ticket count
       | beyond responsiveness and issue cycle time.
        
       | tboyd47 wrote:
       | Combining both of these into one team is a terrible idea rotten
       | to its core.
       | 
       | This is because the pathway to excellence of one leads to
       | mediocrity in the other. Sprint tickets require an internal
       | orientation, focused work, and blocked-out time. Support tickets
       | require customer orientation, fast response time, and high
       | availability.
       | 
       | If you require your developers to be responsive to customers'
       | needs, they will never have time to do sprint work. If you
       | require them to have excellence in sprint work, they will not be
       | responsive to customers.
       | 
       | There is no way to "time box" these matters, i.e. devote a
       | certain number of hours for this for a certain number of hours
       | for that, because excellence cannot be time boxed. Excellence
       | requires you to do whatever is necessary to achieve it. And these
       | two types of excellence cancel each other out and nullify each
       | other.
        
       | madrox wrote:
       | I love the way my team does it.
       | 
       | - All support tickets go into a Kanban board separate from the
       | scrum board
       | 
       | - Whomever is primary oncall works exclusively in this queue and
       | is excluded from the sprint for that week. Depending on volume,
       | you can add a second goalie.
       | 
       | - If the support queue is empty, the oncall works on lower
       | priority debt or other OE.
       | 
       | - Oncall changes weekly for us on Monday
       | 
       | As a manager, I love it because it's predictable for planning.
       | The team loves it because they're not constantly jerked around.
        
         | kwanbix wrote:
         | That is exactly what I did on my previous team.
        
         | rednerrus wrote:
         | We're doing something similar and it seems to work pretty well.
        
         | bottlepalm wrote:
         | +1, same for us
        
         | michaelsmanley wrote:
         | This is how we manage things as well. Support is a totally
         | different stream of work from product development.
         | 
         | I've often had a difficult time getting non-engineering
         | management to understand the difference between proactive work
         | that can be planned and reactive work that is unplannable by
         | its nature. All of it looks like deficiencies in the product to
         | a certain cohort of stakeholders. Eventually, once they see
         | things smooth out by separating those workstreams, they catch
         | on.
         | 
         | YMMV.
        
           | madrox wrote:
           | This is, in my experience, the core differentiator between a
           | tech company and a non-tech company.
        
         | domk wrote:
         | This is how our team does it as well. We are 5 engineers at a
         | startup, so inbound support tickets that need to be turned
         | around fairly urgently are common. Having one person dedicated
         | to on call and dealing with support saves everyone else from
         | constant context switching and leaves them to focus on planned
         | work.
        
       | gardenhedge wrote:
       | I've seen scrum failing to work too many times. It can work of
       | course, but generally doesn't.
        
       | wintogreen74 wrote:
       | I'd strongly suggest you look at Kanban over sprint for your
       | work, if you deal with a lot of emerging priorities. The value of
       | sprint is the shorter-but-fixed work windows. If you need to
       | consistently manage priorities that change work within this
       | timeframe, sprint is a worse approach. You probably have a good
       | idea of how much of your work (over time) is interrupt-driven, so
       | if you need to do some project-oriented work you can split
       | capacity across these streams and then run them differently.
        
       | Strongbad536 wrote:
       | Do Kanban if you're a team that is relied on by other teams for
       | support
        
       | CuriousSkeptic wrote:
       | If you do 1-day sprints and merge the daily stand-up meeting with
       | the demo and sprint-planning and retro you may get away with
       | calling it Scrum and still have something that works
        
         | slim wrote:
         | you may get something that works, but you don't know when
        
           | CuriousSkeptic wrote:
           | You could do a bi-weekly retro to catch issues not really
           | appropriate for stand-up.
           | 
           | Point was that if commitments cant be made for weeks at a
           | time just shorten the iteration time.
           | 
           | The core part of Scrum is to regularly collaborate as a team
           | on assessing the situation and organise the next action to
           | improve it. Can happen hourly if needed.
        
       | mancerayder wrote:
       | By finding a new job! Sprint is completely inappropriate for
       | infrastructure teams. It's impossible to predict support work,
       | and there's a ton of unpredictable engineering effort even for
       | scheduled projects.
       | 
       | If it is implemented, then it'll be heavily gamed, or you have
       | people working ridiculous hours.
       | 
       | Kanban makes more sense.
        
       | brightball wrote:
       | Do you do any type of Root Cause Analysis tracking?
       | 
       | If that many tickets are coming in and disrupting everything, I'd
       | be using that to prioritize addressing the causes so the
       | disruptions stop.
       | 
       | If you've ever read The Phoenix Project, this is well illustrated
       | when they talk about prioritizing preventative measures for
       | "unplanned work" and they map out in detail why it's such a huge
       | problem.
        
       | maayank wrote:
       | I'm in a similar boat. Furthermore, people tend to overlook that
       | many new issues -> lots of routing effort (e.g. reading the
       | ticket to see what team it best fits and then which person) which
       | further cripples mid- to low-level management.
        
       | gorjusborg wrote:
       | What works well will probably vary wildly between companies, due
       | to the way that internal-tier support has to interface with the
       | more customer-facing tiers.
       | 
       | That said, you have to be brutal during triage. Have a bucket of
       | time set aside to handle support. That time is used for
       | operational issues that are important but not more important than
       | any product work you are doing. On a healthy system, most of the
       | issues can usually wait to be fixed, due to either low severity
       | or low number of customer impacted (or both).
       | 
       | For very high urgency issues, you drop everything. The sprint
       | doesn't matter anymore, you are keeping the lights on. If your
       | team is constantly dropping everything and blowing out sprints
       | (or whatever duration measure you use), you need to look into
       | why. Most likely, there is a quality problem somewhere that
       | should be your top priority for sprint work. If you can't
       | prioritize fixing that sort of thing, you need find someone who
       | will listen, explain how much money is being lost spinning plates
       | due to something that could be fixed at the root.
       | 
       | If you still can't get it prioritized, find another company.
       | You'll burn out and quit at some point anyway ;)
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-01-23 23:02 UTC)