[HN Gopher] Lucky Lotto, chaos engineering but for teams
       ___________________________________________________________________
        
       Lucky Lotto, chaos engineering but for teams
        
       Author : delebe
       Score  : 128 points
       Date   : 2021-07-01 06:58 UTC (16 hours ago)
        
 (HTM) web link (danlebrero.com)
 (TXT) w3m dump (danlebrero.com)
        
       | physicsgraph wrote:
       | I have a different approach for create a similar outcome:
       | everyone rotates projects on a periodic basis. The period of
       | change is longer than a sprint duration so as to give people time
       | to acclimate to the new-to-them project.
       | 
       | [0] https://graphthinking.blogspot.com/2021/06/periodic-
       | rotation...
        
         | GuB-42 wrote:
         | Your approach is lawful, the one in the article is chaotic.
         | 
         | Your approach is planned, with well defined onboarding and
         | offboarding and with a report in the end. Personally, I hate it
         | just for the idea of having to write a report.
         | 
         | The article approach in unpredictable. At the last moment, you
         | know you are not part of the team anymore, and all
         | communications are cut. People have to take over even if you
         | are in the middle of something. It also involves managers. I
         | prefer this, if anything, just because there is no report to
         | write except if things go wrong (i.e. you break the "no
         | communication" rule).
         | 
         | The chaotic version of your solution would involve periodically
         | drawing two team members, including managers and having them
         | switch teams immediately. For each individual, the period of
         | change would follow an average but be random. And no reports :)
        
       | cableshaft wrote:
       | This is extremely clever, and something I've never even heard of
       | before, bravo. The closest I've ever seen to this sort of
       | practice in the wild is periodically shifting people around so
       | that more than one person knows how a system works. But more
       | often I've seen no attempts at this, and a mad scramble with KT
       | meetings when someone gives their two weeks notice.
       | 
       | I'm expecting a similar mad scramble when I put in my two weeks
       | at my current company (hopefully within the next two months,
       | interviewing with various companies now).
        
       | dchoi315 wrote:
       | This is a great idea! It sounds similar to controlled chaos
       | and/or engineering serendipity, where you intentionally create
       | chaos in order to uncover some sort of new skill or increase
       | productivity
        
       | jgerrish wrote:
       | Could you imagine being part of this. Told the company was
       | running DiRT testing or whatever, and then getting fired or
       | punished for listening and following instructions?
       | 
       | Could you imagine afterwards being lectured to use common sense.
       | Also, we have secret corporate policies you need to know about
       | and follow. Resolve that catch-22. Repeat for a decade.
       | 
       | You'd be justified in never trusting them with your life or
       | another loved ones again.
        
         | detaro wrote:
         | ... did you intend to reply here, or on some other post?
        
       | jgerrish wrote:
       | Oh, also, really clever! I love studying resiliency in complex
       | systems!
        
       | mingusrude wrote:
       | I have used similar discussion points as in this article to argue
       | that parental leaves, long vacations and other similar employee
       | benefits build resilient teams and products. If you have to plan
       | for people going missing (without the possibility for immediate
       | replacement) you are forced to spread knowledge in the
       | organization and you can't be too dependent on individuals.
        
       | insomniacity wrote:
       | I like this - we've had "reduce our bus factor" on the to-do list
       | for a while, but not found a way to do it yet.
        
         | Cthulhu_ wrote:
         | The challenge there is that you need to hire and train more
         | people than you have right now, and hiring is difficult. I mean
         | probably not so much if you have a lot of money.
         | 
         | I'm currently in a high bus factor job, I'm the only developer
         | on the UI - our CTO can do some small jobs here and there, but
         | nobody's touched or even looked at the new UI I'm building (Go
         | + React).
         | 
         | I want to be able to leave, but the way things are going - and
         | the way recruiters are spinning up again - I'm afraid I'll have
         | to bring them the bad news that I got an offer I can't refuse
         | (and that they can't match; we're talking up to 150% pay rise /
         | benefits).
         | 
         | What my company needs is a big bag of money so we can hire
         | contractors to fast forward this project. Which will be a short
         | term solution, but still. But this company isn't eager, it's
         | run by mid-late career veterans who are happy with things
         | bumbling along and a decent 15%/year growth. I can kinda
         | respect that, but for my project that's not good enough. And
         | it's an unsexy company, so they struggle to hire anyone.
        
           | ckdarby wrote:
           | They'll survive. It is common to overvalue yourself when
           | leaving the company but the reality is the company will adapt
           | just like it always has and if it doesn't it would have died
           | even with you there from choices that didn't allow it to
           | adapt.
           | 
           | Just accept the offer and move on. It is the best thing you
           | can do for a company like this.
        
           | eru wrote:
           | > I'm currently in a high bus factor job, [...]
           | 
           | That's actually a low bus factor, isn't it?
           | 
           | https://en.wikipedia.org/wiki/Bus_factor
        
             | retzkek wrote:
             | From the article you linked: "There is a rare alternative
             | definition for the bus factor, namely: the number of people
             | who are indispensable for the project. In other words, it
             | is the minimum number of people who are a single point of
             | failure. If using this definition, then a high bus factor
             | is considered a bad thing (since the loss of any person
             | included destroys the project), and zero is considered the
             | ideal bus factor."
             | 
             | Perhaps "bus risk" would be a better term for this usage?
        
         | layoutIfNeeded wrote:
         | Ummm, _reducing_ the bus factor is probably the opposite of
         | what you would want.
        
           | jweather wrote:
           | Reducing the impact of the bus factor, then. Kind of like
           | turning down the air conditioning... which way does the
           | thermostat go?
        
           | metters wrote:
           | I guess he/she is using the alteration of the definition of
           | the bus factor:
           | 
           | > There is a rare alternative definition for the bus factor,
           | namely: the number of people who are indispensable for the
           | project. In other words, it is the minimum number of people
           | who are a single point of failure. If using this definition,
           | then a high bus factor is considered a bad thing (since the
           | loss of any person included destroys the project), and zero
           | is considered the ideal bus factor.
           | 
           | Source: https://en.wikipedia.org/wiki/Bus_factor
        
         | kriro wrote:
         | The most practical solution: Hire new people. Let the low-bus
         | factor people mentor the new people to become them 2.0 (long
         | term).
         | 
         | Downside: Takes time (and removes mentor from tasks deemed more
         | important short term), costs money, people might need to get
         | used to becoming mentors (I found most people enjoy this even
         | if they cannot imagine doing it the first time around)
         | 
         | Upsides: Long term thinking and investing in employees is
         | rarely wrong (imo)
        
         | newswasboring wrote:
         | I don't think this is a mechanism to reduce the bus factor, to
         | me it seems like a mechanism to make people realize what the
         | bus factor is for the team. The actual solutions were not
         | described in the article but I guess that is very team
         | specific.
        
           | GauntletWizard wrote:
           | Aligning the pain with the people who can solve it is the
           | first rule of getting things done. Make the developers who
           | will have to take on the bus-factor workload realize what the
           | bus factor is, and they'll figure out how to reduce their
           | dependencies real quickly.
           | 
           | Another good take on this is the "Wheel of Misfortune"; Take
           | a real incident (or synthesize one in testing) that paged
           | someone inconveniently. They've already solved it - They're
           | running the exercise. Now have everyone else on the team,
           | individually or as a group, figure out how to solve it. Not
           | from the debugging steps or postmortem of the incident, but
           | before that's been shared - Have them all suggest what they'd
           | look for, and how to fix it. Their most important resource is
           | their instructor/adversary for this, so give them all the
           | data, but no map to it. Further, have the instructor figure
           | out how to respond to all the unknowns - What else was broken
           | because of the incident? What would have happend if they
           | tried different debugging steps? Builds team knowledge real
           | fast.
        
           | insomniacity wrote:
           | Yes, good point, I guess the first step is realising you have
           | a problem and proving it to management!
        
           | tharkun__ wrote:
           | Maybe you didn't read the whole article then. It totally
           | describes how to do it. The bottom has a nice summary but
           | I'll quote the solution parts of it here:
           | The winner will work on some side project. Still work.
           | 
           | Not a solution to the bus-factor of people but a good
           | solution to the "we never get to work on tech debt / platform
           | work because features" problem. This alone is awesome about
           | this.                   Everybody, including product
           | managers, gets one ticket every week, even if you don't want
           | it.
           | 
           | This, if done right, will result in Product Managers
           | providing a vision and consistent answers to similar type
           | questions. Thus the team can learn to anticipate their
           | answers for minor things (which helps even in week where the
           | PM isn't the winner) and in weeks in which he is on vacation
           | (or wins again) the team isn't stuck waiting for a week for
           | an important question that blocks development.
           | Team should avoid delaying the work for a week.          Try
           | to bring one of your colleagues to do the task with you or
           | under your supervision.
           | 
           | This is what to do, when you need to break "rule 3" (which
           | states you have to be completely unavailable). It's a soft
           | rule and I think the point is actually to break it a lot in
           | the beginning. The "under supervision" part means, you are
           | teaching another team member part of what made you the one
           | with the bad bus factor. As time goes on, having to break
           | this rule will become less frequent (which is why they wanted
           | everyone to write down when they have to break rule 3 if you
           | ask me)
        
             | newswasboring wrote:
             | Keeping aside the insulting nature of suggesting someone
             | did not read the article, here is my response.
             | 
             | I like to differentiate between mechanism to make people
             | realize a problem and actual solutions to the problem. Its
             | very easy to say mentor someone to do your job but if that
             | was doable and easy they would be already doing it. The
             | problem is precisely that there is no real good way to do
             | KTs. Forcing someone to solve the problem is one way to KT,
             | but is that the most efficient way? I would rather gather
             | data about what breaks down and come up with a more
             | efficient KT mechanism.
        
               | tharkun__ wrote:
               | FWIW I operate under the assumption that HN is just as
               | bad as Slashdot with regards to commenting without
               | reading the article :) and the solutions were pretty
               | clear to me, even without the nice summary at the end.
               | Sorry if it wasn't worded softly enough, no insult meant,
               | more an observation.
               | 
               | The parts I mentioned are not the 'realization' part.
               | They are the solution parts. They aren't the
               | 'implementation details' of the solution, I would agree,
               | but they are the solution.
               | 
               | If you ask me the problem is not usually that there is no
               | good way to do KT or that it just isn't doable at all.
               | The problem is that in most businesses due to their
               | 'culture' (for lack of a better word - not to start
               | "that" culture discussion) you do not actually get to do
               | it. A good way to mentor and transfer knowledge for
               | example is to do pair programming. Not many places allow
               | for that and will look at you funny for even suggesting
               | it (and I'm not even personally on the extreme end of
               | that like some companies, where everyone literally pair
               | programs 100% of the time - I like a sort of hybrid
               | model, where people pair for as long as it makes sense to
               | them, which could be sitting there "designing" for a
               | couple of hours together, maybe dividing things up after
               | a basic structure is in place and then working on their
               | own for the rest of the day w/ some quick 5 minute sync
               | ups and questions going back and forth from time to
               | time). Another very doable way to do knowledge transfer
               | is to specifically not give let the one guy that wrote
               | that part of the code and knows it inside and out work on
               | the next ticket that will need changes to that part. But
               | many Product Managers/businesses/team leads will not
               | allow that because it would mean that the task will be
               | delivered slower.
               | 
               | The beauty of this approach is that you don't need to
               | actively gather data, make a decision etc. Gathering this
               | data is usually very error prone in that you can fill out
               | forms and skills matrices and such all you want (been
               | there done that), you always forget about something or it
               | doesn't really tell you the whole story (skills matrices
               | are particularly bad)
               | 
               | With this, it just happens! It's the self organizing way
               | of dealing with the problem. I think you might be putting
               | too much emphasis on the "completely unavailable" part,
               | whereas I see the "it's a soft rule" part bigger. Instead
               | of waiting for someone to go on vacation and then you
               | find out that he was the only one that can do X (there's
               | your data point that you missed during collection and
               | analysis) and now you're in deep trouble, because he's
               | touring the Amazon rain forest (i.e. definitely no cell
               | reception there), you get to figure it out because he won
               | the lottery and he's actually at work and can tutor
               | someone through it all.
        
       | jedberg wrote:
       | I heard a good presentation about this. They did it for a team at
       | Google. But it was daily, and you weren't allowed to tell anyone
       | else. You just didn't respond to any requests for that day.
       | 
       | But even worse, you could also be assigned the "liar" task. In
       | which case you were supposed to reply to emails but give
       | intentionally wrong information some of the time. But in that
       | case you would tell people that you were the liar and that your
       | answers aren't to be trusted.
       | 
       | It seemed like a good way to make sure at least two people could
       | do everything and that documentation was solid enough that you
       | could recognize when someone was wrong.
        
         | geoduck14 wrote:
         | > But it was daily, and you weren't allowed to tell anyone
         | else. You just didn't respond to any requests for that day.
         | 
         | I'm 100% certain I've worked with people that did this. I never
         | realized they were DR visionaries
        
         | eru wrote:
         | When at Google, we did the simpler version (of not replying)
         | for our DiRT week exercises. DiRT stands for disaster recovery
         | testing.
        
         | edenhyacinth wrote:
         | I wonder if this encouraged people to be more open with their
         | ideas also.
         | 
         | If I could say things knowing that if I made a mistake, someone
         | might friendly correct me with "Ah, you're the liar today, S3
         | doesn't support that format", I'd be happier to make them.
        
         | treeman79 wrote:
         | How are people afterwards.
         | 
         | I refuse to use email at work anymore because of the constant
         | phishing tests. Sick of the gotcha mentality.
        
           | welcome_dragon wrote:
           | I agree. We have a security score at work and mine had been
           | zero for a long time. I realized that you need to report the
           | emails as spam or phishing and not just ignore/delete like I
           | usually do.
        
           | newswasboring wrote:
           | I am genuinely fascinated how they managed to piss you off so
           | much with phishing tests. (For me email is the backbone of
           | all permanent office communication). Was the frequency too
           | high (reducing the signal to noise ratio too much), or just
           | the fact that they doubt your ability to fall for such a
           | thing?
        
             | treeman79 wrote:
             | 2-3 a month. Often spoofing other co workers I need to be
             | cautious on all emails.
             | 
             | I use it now only to see if notifications came in from
             | calendar or such. Even then it's just looking at subject
             | lines.
             | 
             | I then go straight to the app and check for actual message
             | / event.
        
             | noneeeed wrote:
             | I was wondering that too. I'm guessing it's frequency. We
             | get them, but they are once every few months. The only
             | annoying thing about them is I have to open outlook to
             | actually report them as I normally use Airmail.
        
       | madhadron wrote:
       | Banks require a planned version of this: you have to take so many
       | contiguous days off at least once a year. But it's an anti-fraud
       | measure, as most of the frauds you can run internally require you
       | to be there to juggle things, and you make the number of
       | contiguous days long enough where such a scheme will come
       | crashing down.
        
         | thedougd wrote:
         | Some stopped in the last few years, or at least for certain job
         | categories. It was a wonderful thing when it was around because
         | you could take a vacation with absolutely no remote access.
        
       | toss1 wrote:
       | Great idea, similar to what I understand is done in accounting.
       | The idea is to force everyone in the actg dept to take regular
       | vacations of several weeks, requiring others to cover their
       | tasks. This is so that others can uncover financial sleight-of-
       | hand to prevent embezzling (or at least structure it such that
       | successful embezzling requires a larger conspiracy, increasing
       | the likelihood of getting caught).
        
       ___________________________________________________________________
       (page generated 2021-07-01 23:02 UTC)