[HN Gopher] Splitting engineering teams into defense and offense
___________________________________________________________________
Splitting engineering teams into defense and offense
Author : dakshgupta
Score : 53 points
Date : 2024-10-14 20:07 UTC (2 hours ago)
(HTM) web link (www.greptile.com)
(TXT) w3m dump (www.greptile.com)
| bradarner wrote:
| Don't do this to yourself.
|
| There are 2 fundamental aspects of software engineering:
|
| Get it right
|
| Keep it right
|
| You have only 4 engineers on your team. That is a tiny team. The
| entire team SHOULD be playing "offense" and "defense" because you
| are all responsible for getting it right and keeping it right.
| Part of the challenge sounds like poor engineering practices and
| shipping junk into production. That is NOT fixed by splitting
| your small team's cognitive load. If you have warts in your
| product, then all 4 of you should be aware of it, bothered by it
| and working to fix it.
|
| Or, if it isn't slowing growth and core metrics, just ignore it.
|
| You've got to be comfortable with painful imperfections early in
| a product's life.
|
| Product scope is a prioritization activity not an team
| organization question. In fact, splitting up your efforts will
| negatively impact your product scope because you are dividing
| your time and creating more slack than by moving as a small unit
| in sync.
|
| You've got to get comfortable telling users: "that thing that
| annoys you, isn't valuable right now for the broader user base.
| We've got 3 other things that will create WAY MORE value for you
| and everyone else. So we're going to work on that first."
| ramesh31 wrote:
| To add to this, ego is always a thing among developers. Your
| defensive players will inevitably end up resenting the offense
| for 1. leaving so many loose ends to pick up and 2. not getting
| the opportunity for greenfield themselves. You could try to
| "fix" that by rotating, but then you're losing context and
| headed down the road toward man-monthing.
| CooCooCaCha wrote:
| Interesting that you describe it as ego. I don't think a team
| shoveling shit onto your plate and disliking it is ego.
|
| I feel similar things about the product and business side, it
| often feels like people are trying to pass their job off to
| you and if you push back then you're the asshole. For
| example, sending us unfinished designs and requirements that
| haven't been fully thought through.
|
| I imagine this is exactly how splitting teams into offense
| and defense will go.
| dakshgupta wrote:
| To add - I personally enjoy defense more because the quick
| dopamine hits of user requests fix -> fix issue -> tell
| user -> user is delighted is pretty addictive. Does get old
| after a few weeks.
| FridgeSeal wrote:
| > For example, sending us unfinished designs and
| requirements that haven't been fully thought through
|
| Oh man. Once had a founder who did this to the dev team:
| blurry, pixelated screenshots with 2 or 3 arrows and vague
| "do something like <massively under specified statement>".
|
| The team _requested_ that we have a bit more detail and
| clarity in the designs, because it was causing us
| significant slowdown and we were told "be quiet, stop
| complaining, it's a 'team effort' so you're just as at
| fault too".
|
| Unsurprisingly, morale was low and all the good people left
| quickly.
| dakshgupta wrote:
| All of these are great points. I do want to add we rotate
| offense and defense every 2-3 weeks, and the act of doing
| defense which is usually customer facing gives that half of the
| team a ton of data to base the next move on.
| bradarner wrote:
| The challenge is that you actually want your entire team to
| benefit from the feedback. The 4 of you are going to benefit
| IMMENSELY from directly experiencing every single pain point-
| together.
|
| As developers we like to focus. But there is vast difference
| between "manager time" and "builder time" and what you are
| experiencing.
|
| You are creating immense value with every single customer
| interaction!
|
| CUSTOMER FACING FIXES ARE NOT 'MANAGER TIME'!!!!!!
|
| They are builder time!!!!
|
| The only reason I'm insisting is because I've lived through
| it before and made every mistake in the book...it was painful
| scaling an engineering and product team to >200 people the
| first time I did it. I made so many mistakes. But at 4 people
| you are NOT yet facing any real scaling pain. You don't have
| the team size where you should be solving things with
| organizational techniques.
|
| I would advise that you have a couple of columns in a kanban
| board: Now, Next, Later, Done & Rejected. And communicate it
| to customers. Pull up the board and say: "here is what we are
| working on." When you lay our the priorities to customers
| you'd be surprised how supportive they are and if they
| aren't...tough luck.
|
| Plus, 2-3 weeks feels like an eternity when you are on
| defense. You start to dread defense.
|
| And, it also divorces the core business value into 2 separate
| outcomes rather than a single outcome. If a bug helps advance
| your customers to their outcome, then it isn't "defense" it
| is "offense". If it doesn't advance your customer, why are
| you doing it? If you succeed, all of your ugly, monkey
| patched code will be thrown away or phased out within a
| couple of years anyway.
| MattPalmer1086 wrote:
| I have worked in a small team that did exactly this, and it
| works well.
|
| It's just a support rota at the end of the day. Everyone does
| it, but not all the time, freeing you up to focus on more
| challenging things for a period without interruption.
|
| This was an established business (although small), with some
| big customers, and responsive support was necessary. There was
| no way we could just say "that thing that annoys you, tough, we
| are working on something way more exciting." Maybe that works
| for startups.
| eschneider wrote:
| If the event-driven 'fixing problems' part of development gets
| separated from the long-term 'feature development', you're
| building a disaster for yourself. Nothing more soul-sucking than
| fixing other people's bugs while they happily go along and make
| more of them.
| dakshgupta wrote:
| There is certainly some razor applied on whether a request is
| unique to one user or is widely requested/likely to improve the
| experience for many users
| fryz wrote:
| Neat article - I know the author mentioned this in the post, but
| I only see this working as long as a few assumptions hold:
|
| * avg tenure / skill level of team is relatively uniform
|
| * team is small with high-touch comms (eg: same/near timezone)
|
| * most importantly - everyone feels accountable and has agency
| for work others do (eg: codebase is small, relatively simple,
| etc)
|
| Where I would expect to see this fall apart is when these
| assumptions drift and holding accountability becomes harder. When
| folks start to specialize, something becomes complex, or work
| quality is sacrificed for short-term deliverables, the folks that
| feel the pain are the defense folks and they dont have agency to
| drive the improvements.
|
| The incentives for folks on defense are completely different than
| folks on offense, which can make conversations about what to
| prioritize difficult in the long term.
| dakshgupta wrote:
| These assumptions are most likely important and true in our
| case, we work out of the same room (in fact we also all live
| together) and 3/4 are equally skilled (I am not as technical)
| jedberg wrote:
| > this is also a very specific and usually ephemeral situation -
| a small team running a disproportionately fast growing product in
| a hyper-competitive and fast-evolving space.
|
| This is basically how we ran things for the reliability team at
| Netflix. One person was on call for a week at a time. They had to
| deal with tickets and issues. Everyone else was on backup and
| only called for a big issue.
|
| The week after you were on call was spent following up on
| incidents and remediation. But the remaining weeks were for deep
| work, building new reliability tools.
|
| The tools that allowed us to be resilient enough that being on
| call for one week straight didn't kill you. :)
| dakshgupta wrote:
| I am surprised and impressed a company at that scale functions
| like this. We often internally discuss if we can still doing
| this when we're 7-8 engineers.
| jedberg wrote:
| I think you're looking at it backwards. We were only able to
| do it because we had so many engineers that we had time to
| write tools to make the system reliable enough.
|
| On call for a week at a time only really works if you only
| get paged at night once a week max. If you get paged every
| night, you will die from sleep deprivation.
| stronglikedan wrote:
| Everyone on every team should have something to "own" and feel
| proud of. You don't "own" anything if you're always on team
| defense. Following this advice is a sure fire way to have a high
| churn rate.
| FireBeyond wrote:
| Yup, last place I was at I had engineers _begging_ me (PM) to
| advocate against this, because leadership was all "We're going
| to form a SEAL team to blaze out [exciting, interesting, new,
| fun idea/s]. Another team will be on bug fixes."
|
| My team had a bunch of stability work, and bug fixes (and there
| was a lot of bugs and a lot of tech debt, and very little
| organizational enthusiasm to fix the latter).
|
| Guess where there morale was, compared to some of the other
| teams?
| LatticeAnimal wrote:
| From the post:
|
| > At the end of the cycle, we swap.
|
| They swap teams every 2-4 weeks so nobody will always be on
| team defense.
| ninininino wrote:
| You didn't read the article did you, they swap every 2 weeks
| between being on offense and defense.
| jph wrote:
| Small teams shouldn't split like this IMHO. It's
| better/smarter/faster IMHO to do "all hands on deck" to get
| things done.
|
| For prioritization, use a triage queue because it aims the whole
| team at the most valuable work. This needs to be the mission-
| critical MVP & PMF work, rather than what the article describes
| as "event driven" customer requests i.e. interruptions.
| dakshgupta wrote:
| A triage queue makes a lot of sense, only downside being the
| challenge of getting a lot done without interruption.
| bvirb wrote:
| In a similar boat (small team, have to balance new stuff,
| maintenance, customer requests, bugs, etc).
|
| We ended up with a system where we break work up into things
| that take about a day. If someone thinks something is going
| to take a long time then we try to break it down until some
| part of it can be done in about a day. So we kinda side-step
| the problem of having people able to focus on something for
| weeks by not letting anything take weeks. The same person
| will probably end up working on the smaller tasks, but they
| can more easily jump between things as priorities change, and
| pretty often after doing a few of the smaller tasks either
| more of us can jump in or we realize we don't actually need
| to do the rest of it.
|
| It also helps keep PRs reasonably sized (if you do PRs).
| joshhart wrote:
| TLDR: The author basically re-invented oncall rotations.
| dakshgupta wrote:
| This makes me want to delete the post.
| stopachka wrote:
| I resonated with your post Daksh. Keep up the good work
| thesandlord wrote:
| Don't do that! This was a great post with a lot to learn
| from.
|
| The fact you came to a very similar solution from first
| principles is very interesting (assuming you didn't know
| about this before!)
| Xeamek wrote:
| Please don't.
|
| I personally found the idea inspiring and the article itself
| is explaining it succinctly. Even if it's not completely
| revolutionary, it's small, self containing concept that's
| actionable.
|
| Lowley surprised why there are so many harsh voices in this
| thread, but the article definitely has merrit, even if it
| won't be usefull/possible to implement for everyone
| candiddevmike wrote:
| Or the idea of an "interrupt handler". OP may find other SRE
| concepts insightful, like error budgets.
| cgearhart wrote:
| This is often harder at large companies because you very rarely
| make career progress playing defense, so it becomes very tricky
| to do it fairly. It can work wonders if you have the right
| teammates, but it's almost a prisoners dilemma game that falls
| apart as soon as one person opts out.
| dakshgupta wrote:
| Good point, we will usually only rotate when the long running
| task is done but eventually we'll arrive at some feature that
| takes more then a few weeks to build so will need to
| restructure our methods then.
| dakiol wrote:
| I once worked for a company that required from each engineer in
| the team to do what they called "firefighting" during working
| hours (so not exactly on-call). So for one week, I was triaging
| bug tickets and trying to resolve them. These bugs belonged to
| the area my team was part of, so it affected the same product but
| a vast amount of micro services, most of which I didn't know much
| about (besides how to use their APIs). It didn't make much sense
| to me. So you have Joe punching code like there's no tomorrow and
| introducing bugs because features must go live asap. And then
| it's me the one fixing stuff. So unproductive. I always advocated
| for a slower pace of feature delivery (so more testing and less
| bugs on production) but everyone was like "are you from the 80s
| or something? We gotta move fast man!"
| dakshgupta wrote:
| This is interesting because it's what I imagine would happen if
| we scaled this system to a larger team - offense engineers
| would get sloppy, defensive engineers would get overwhelmed,
| even with the rotation cycles.
|
| Small, in-person, high-trust teams have the advantage of not
| falling into bad offense habits.
|
| Additionally, a slower shipping pace simply isn't an option,
| seeing as the only advantage we have over our giant competitors
| is speed.
| jedberg wrote:
| > offense engineers would get sloppy
|
| Wouldn't they be incentivized to maintain discipline because
| they will be the defensive engineers next week when their own
| code breaks?
| dakshgupta wrote:
| I suspect as the company gets larger time between defensive
| sprints will get longer, but yes, for smaller teams this is
| what keeps quality high, you'll have to clean up your own
| mess next week.
| DJBunnies wrote:
| I think we've worked for the same org
| shalmanese wrote:
| To the people pooh poohing this, do y'all really work with such
| terrible coworkers that you can't imagine an effective version of
| this?
|
| You need trust in your team to make this work but you also need
| trust in your team to make any high velocity system work.
| Personally, I find the ideas here extremely compelling and
| optimizing for distraction minimization sounds like a really
| interesting framework to view engineering from.
| shermantanktop wrote:
| "The core functionality may remain largely intact but the
| periphery is often buggy, something we expect will improve only
| as our engineering headcount catches up to our product scope."
|
| Oh you sweet summer child...
| stopachka wrote:
| > While this is flattering, the truth is that our product is
| covered in warts, and our "lean" team is more a product of our
| inability to identify and hire great engineers, rather than an
| insistence on superhuman efficiency.
|
| > The result is that our product breaks more often than we'd
| like. The core functionality may remain largely intact but the
| periphery is often buggy, something we expect will improve only
| as our engineering headcount catches up to our product scope.
|
| I really resonate with this problem. It was fun to read. We've
| been tried different methods to balance customers and long-term
| projects too.
|
| Some more ideas that can be useful:
|
| * Make quality projects an explicit monthly goal.
|
| For example, when we noticed our the edges in our surface area
| got too buggy, we started a 'Make X great' goal for the month.
| This way you don't only have to react to users reporting bugs,
| but can be proactive
|
| * Reduce Scope
|
| Sometimes it can help to reduce scope; for example, before adding
| a new 'nice to have feature', focus on making the core experience
| really great. We also considered pausing larger enterprise
| contracts, mainly because it would take away from the core
| experience.
|
| ---
|
| All this to say, I like your approach; I would also consider a
| few others (make quality projects a goal, and cut scope)
| madeofpalk wrote:
| Somewhat random side note - I find it so fascinating that
| developers invented this myth that they're the only people who
| have 'concentration' when this is so obviously wrong. Ask any
| 'knowledge worker' or yell even physical labourer and I'm sure
| they'll tell you about the productivity of being "in the zone"
| and lack of interruptions. Back in early 2010s they called it
| 'flow'.
| svilen_dobrev wrote:
| IMO the split, although good (the pattern is "sacrifice one
| person" as per Coplien/Harrision's Organisational patterns book
| [0]), is too drastic. It should be not defense vs offense 100%
| with a wall inbetween, but for each and every issue (defense)
| and/or feature (offense), someone has to pick it and become the
| responsible (which may or may not mean completely doing it by
| hirself). Fixing a bug for an hour-or-two sometimes has been
| exactly the break i needed in order to continue digging some big
| feature when i feel stuck.
|
| And the team should check the balances once in a while, and maybe
| rethink the strategy, to avoid overworking someone and
| underworking someone else, thus creating bottlenecks and vacuums.
|
| At least this is the way i have worked and organised such teams -
| 2-5 ppl covering everything. Frankly, we never had many customers
| :/ but even one is enough to generate plenty of "noise" - which
| sometimes is just noise, but if good customer, will be mostly
| real defects and generally under-tended parts. Also, good
| customers accept a NO as answer. So, do say more NOs.. there is
| some psychological phenomena in software engineering in saying
| yes and promising moonshots when one knows it cannot happen NOW,
| but looks good..
|
| have fun!
|
| [0] https://svilendobrev.com/rabota/orgpat/OrgPatterns-
| patlets.h...
___________________________________________________________________
(page generated 2024-10-14 23:01 UTC)