[HN Gopher] Operations is not Developer IT
___________________________________________________________________
Operations is not Developer IT
Author : todsacerdoti
Score : 287 points
Date : 2021-09-03 10:42 UTC (12 hours ago)
(HTM) web link (matduggan.com)
(TXT) w3m dump (matduggan.com)
| jon_north wrote:
| When developers see ops as IT for them, it's because ops (and
| overall management) is doing a poor job laying out the actual
| responsibilities of each role in the org.
| cbushko wrote:
| I'd also add that Ops is not automating these problems away.
| markus_zhang wrote:
| One thing that I realized as a Business Analyst and then Data
| Engineer is that communication is the key but pretty much no one
| can do it properly. I also realized that communication should be
| part of the job and leads and managers should hire people who can
| communicate things effectively.
| danielovichdk wrote:
| This is a great showcase of the silo mentality and split.
|
| One should not build silos where experts sit.
|
| One should participate in a team of many different experts.
|
| If you still have to call "whatever department" for fixing your
| slow SQL query, your disk space, your repo access, production
| deployments etc. Then in most cases I feel you should get out and
| find a place where silos are not being exercised.
|
| This is a thing of the past.
| greedo wrote:
| Try avoiding silos if you work in any financial industry, or
| have any regulations around your business.
| SpaceL10n wrote:
| > This is a thing of the past.
|
| Is it? In highly regulated environments, silos are practically
| a requirement. Security access controls are intentionally put
| in place to limit access to systems and their respective
| pieces. If you need repo access, or access to the CI pipeline,
| or access to the database, you have to go to the appropriate
| channel.
| dsr_ wrote:
| The scale of security necessities in organizations runs from:
|
| - no need for anything but minimal security, perhaps because
| the business is trying to surf the margin between their AWS
| bill and their Google ad revenue.
|
| - there is only a need for security in the part of the
| business that deals with money
|
| - some stuff that the users do, they would prefer to maintain
| integrity but they don't care a lot about confidentiality
|
| - the users want reasonable confidentiality, too
|
| - everything about the business is money or secrets
|
| Where your business is on that scale determines how much you
| are regulated and how many internal gatekeepers are
| necessary.
| WesolyKubeczek wrote:
| So I remember the times when programmers were power users and
| were poking fun at "lamers" and "lusers" who always went like "I
| clicked something and a message appeared, what do I do?"
|
| And now people who call themselves programmers are like this
| themselves.
|
| Now can I have some other kind of future please?
| [deleted]
| sgarland wrote:
| My biggest bugbears for anyone in tech are an incurious nature,
| and an inability to search documentation/Google.
|
| Your code threw an error? Did you read it? Did you search Google?
|
| A monitored metric dramatically changed with the last deploy?
| Have you investigated why that might be?
|
| I am more than happy to help troubleshoot a tricky problem if you
| tell me what you've already tried. If you truly don't know where
| to begin, I'm also happy to teach you. What I am not going to do
| is fix your problem for you, with you retaining nothing.
| szundi wrote:
| Be happy that your colleagues bring you in to these
| conversations.
| darkwater wrote:
| I'm writing this as a DevOps/SysEng/whatever you prefer: I think
| there are two big type of developers groups in the industry, and
| two ways to think about these issues. The first group advocates
| for an ultimate "NoOps" world where every team, composed only by
| SWE, is responsible for the whole code lifecycle, from inception
| to deployment to maintenance to decommission. This is
| aspirational in many, many companies and probably true in a bunch
| of really good companies. Anyway I still wonder if there aren't
| tensions there between product/business asking for developer
| power to bring new features/changes in and developers doing the
| Operations work beside writing business code. The second group is
| formed by the developers described in the article, which just
| focus on their code and need a "developer helpdesk" for
| everything else related to actually deploying and operating the
| software written. IME this is how the vast majority of companies
| work, especially the "normal" ones. Some developer steps "up" and
| try to understand/do this extra work and they usually rise to the
| top because they are the "good" engineers.
| robertlagrant wrote:
| > every team, composed only by SWE, is responsible for the
| whole code lifecycle, from inception to deployment to
| maintenance to decommission
|
| I think this is the original thesis for DevOps.
| nonameiguess wrote:
| This isn't true at all, though. Many organizations,
| especially smaller ones, seem to have taken it this way.
| Everyone is responsible for everything. But that wasn't the
| DevOps thesis. The idea was for eliminating silos in the
| sense that dev and ops would talk to each other continuously
| throughout the software lifecycle, with developers
| understanding operational challenges and creating software
| that didn't just meet functional requirements, but was easier
| and faster to deploy, change, rollback, and troubleshoot. Ops
| would understand the pressures of development and create
| automation systems catered to the specifics of their software
| stack.
|
| It wasn't supposed to mean no more division of labor.
| Division of labor is a key innovation in human society that
| enables civilizations to exist. It was supposed to mean the
| teams in different categories of labor interact throughout
| and consider each other's needs, and not throw shit at each
| other over a wall and only ever interact through a ticketing
| system.
| twic wrote:
| I believe that it _was_ supposed to mean no more division
| of labour, and i believe that based on conversations with
| people who were early adopters of it ten years ago.
|
| This waffly "breaking down the silos" stuff is a later
| redefinition, i believe reacting to the fact that the
| original meaning was extremely unpalatable to existing
| organisations, with existing employees and hierarchies who
| would be severely disrupted by it.
| vajrabum wrote:
| Here's a link to an interview with Patrick Debois who is
| one of the two guys who came up with the concept of
| DevOps. https://blogs.oracle.com/javamagazine/how-dev-
| versus-ops-bec...
|
| From Patrick Debois in the interview "Later, I saw a talk
| by Jean-Paul Sergent about developer (Dev) teams and
| operations (Ops) teams working together."
|
| So no, breaking down the silos was baked in from the
| beginning.
| darkwater wrote:
| But I still challenge the fact that it can really work
| well for normal performers and not only top notch
| developers. Taking care of all the phases of a software
| lifecycle is not an easy task, it has a big enough
| cognitive load. I understand that this is supposed to be
| done in (small) teams, but within those teams you still
| need some degree of separation of labour, you cannot have
| everybody having wide and deep knowledge of everything
| software. Well, you can do it, if you are Google and the
| like and set the bar for hiring really, really high. But
| that doesn't apply to most organizations and engineers.
| Not all of us are wicked smart.
| robertlagrant wrote:
| You're missing the fact that a team can have different
| specialisms. It doesn't have to be each individual can do
| everything.
| m1keil wrote:
| The original thesis was better cooperation between developers
| and sysadmins (ops). It didn't focus on trying to make
| operations redundant or transfer every sysadmin into a SWE.
| thesuperbigfrog wrote:
| It is.
|
| The team that writes the code should also deploy the code and
| get paged in if there are problems in production.
|
| That creates a tight feedback loop that requires developers
| to learn and manage the whole stack, code defensively, and
| test enough to be confident to deploy to production.
|
| Didn't test your code enough? You will be paged in the middle
| of the night to fix it. It creates a strong incentive to make
| good decisions because you will be living in the mess you
| create.
| cestith wrote:
| In those cases, though, it's usually a team with some Dev
| people who know or can learn some Ops and some Ops people
| who know or can learn some Dev. It's not just laying off
| all the sysadmins, network admins, stack architects, and
| all then letting the developers freefall until they find a
| way to right themselves.
| cbushko wrote:
| And this only works if your Ops team provides good tooling
| for deploying, logging, monitoring and alerting.
|
| I have heard of companies deciding to do 'devops' and it
| turns into a free for all of dev teams having to
| handle/build things end to end. Everyone loses in that
| scenario.
| darkwater wrote:
| Yes, absolutely, but it's not actually widely implemented.
| Orgs does "DevOps" but they are just automating/writing as
| code some things that previously were done manually by
| Ops/SysAdmins. Now we have DevOps roles doing that same work
| but with other tools (Terraform, Cloudformation etc), much
| more automation, less gruntwork and toll but STILL used as
| "developers IT" nonetheless.
| robertlagrant wrote:
| Not disagreeing :)
| lbhdc wrote:
| > every team, composed only by SWE, is responsible for the
| whole code lifecycle, from inception to deployment to
| maintenance to decommission
|
| This is how it is at my startup. All of the engineers are
| involved in managing the infrastructure for everything we
| build. I find it gives me much better insight into my app, and
| the feedback loop is much tighter since I am in control of
| everything.
| closeparen wrote:
| To put some color on the "NoOps" world: it is not that product
| engineers are directly touching cloud or metal. We have an
| infrastructure group. It ships a PaaS. It doesn't get involved
| with a specific tenant of that PaaS unless a product engineer
| has evidence that there's something wrong with it & escalates a
| page. Product engineers click the deploy and rollback buttons
| for themselves.
| darkwater wrote:
| Thanks for stating this, I think it's exactly what a DevOps
| oriented org should ultimately achieve. Or at least try to.
| syspec wrote:
| "I'm good at my field, people in other fields are terrible."
| tester34 wrote:
| Is this $BIG_Company problem?
|
| I feel like devs at small companies are doing everything -
|
| coding, testing, supporting customer, deployments,
| troubleshooting and of course straight over ssh+winscp cuz
| vps/bare metal are cheaper
| makach wrote:
| well, they have to. Does not mean they do all that. They just
| skip the hard parts and focus on what is important in order to
| survive.
|
| large organisations usually have a much bigger responsibility
| and is held to higher standards, frequently audited and
| controlled to stay within rules and legal compliance
| greedo wrote:
| 100%. Over the last two years, a majority of my "sysadmin"
| work has been devoted to audit and compliance tasks. Mostly
| validating and working with auditors, but also making
| significant changes to work processes.
| michaelt wrote:
| It's a medium-sized company problem.
|
| If you've got 2 developers, they're both doing everything and
| on call 24/7 and all have read/write access to everything on
| demand.
|
| If you've got 200 developers, you're going to start wanting a
| team of shift workers keeping an eye on the systems, and maybe
| you won't want every developer to have read/write access to
| production data.
|
| If you've got 20,000 developers your working practices and
| infrastructure are almost completely cemented in place, and
| anyone who doesn't like them has already left because it's
| easier to change jobs than to get 20k people to change their
| behaviour.
| tomrod wrote:
| I agree with the author's stresses. What they miss in their
| recollection of the halcyon days of large teams of experts is
| that it's extraordinarily expensive and (in many cases!)
| potentially wasteful of business resources to have experts on
| staff to maintain stable equipment.
|
| Don't get me wrong, I appreciate well operating systems and the
| people who make that happen. There is going to be a beancounter
| wondering whether the very expensive, trained engineer can
| operate faster/cheaper.
| nijave wrote:
| The sad truth in many cases: increased errors are cheaper than
| the cost to prevent them
| sgt wrote:
| I laughed when he mentioned the Node developers. Come on guys,
| 90% of Node developers couldn't even code their way out of a
| paper bag. The quality of developers is shocking.
| joedoejr wrote:
| Haha nice one, I love to be a pure "cloud" engineer just because
| i can send a idiot dev to idiot azure support and enjoy they
| solve trivial "i don't read docs" issue for months.
| mrintegrity wrote:
| Having worked in Operations in some form or another for the past
| ~20 years this articulates so well the feelings I have been
| increasingly having over the past few of those 20 years. Now I
| manage a small operations team and we experience pretty much all
| of the issues highlighted in the article.
|
| There needs to be a rethink of how infrastructure, development
| and deployment is handled.. maybe the solution is to slow things
| down and insert a little carefully thought out bureaucracy
| between the layers (can't believe I'm advocating for more
| bureaucracy!)
| jimmySixDOF wrote:
| You will probably never get that past the Change Management
| Board ;}
| datavirtue wrote:
| This has been my experience as well. I used to work between infra
| and development and I saw first hand a constant stream of
| clueless devs that don't read documentation starting shit with
| infra and networking because they don't know what's wrong and
| just assumed it was the monkey-brained infra people who had
| something screwed up. The infra people were equally disdainful of
| the "stupid devs." Honestly, no one worked together effectively
| but the devs would just pull in a new framework or language, roll
| it out because some blog posts said it was cool, and then gripe
| at infra about perceived problems.
|
| Now I'm in a dev ops team (as a dev) and we spend a very large
| swath of our time---troubleshooting infra issues. It's all AWS
| and our problem now.
| uvesten wrote:
| From the article: "Often they have not even bothered to do basic
| troubleshooting, things like read the documentation on what the
| error message is attempting to tell you."
|
| This has been the bane of my work happiness for a while now. I
| keep having to tell junior devs to actually _read_ the fine error
| message, just in case it actually _contains information about the
| error_, you know. Not that it seems to help much, it's like they
| can't get the concept into their heads.
|
| This is 100% a problem with younger, bootcamp-"educated" devs, in
| my experience. I know the common wisdom on social media is "no
| one reads text anymore", but if that includes aspiring
| developers, it might be tough to replace the current workforce
| when that day comes...
| AnIdiotOnTheNet wrote:
| I don't think it's just because of bootcamp education, I think
| people are growing up in a world where error messages are
| either never displayed, or displayed in the from of "The
| program has did a sad :( try again later."
|
| They're not used to reading error messages because they've been
| brought up seeing nothing but completely useless error
| messages.
| spaetzleesser wrote:
| I always preach to people that they have to make it easy for
| others to help them. Don't just say "it doesn't work" and
| expect them to analyze the issue and take care of things.
| Instead provide information what you did, send log files,
| screenshots and whatever other information you may have.
|
| I think the people most likely to fall into the "it doesn't
| work" category are people who don't have much experience
| troubleshooting difficult problems.
|
| In the end it's about compassion and understanding of each
| other. Unfortunately in a lot of companies the only direction
| people are getting is "get it done on time". It's rare that
| management asks people to have empathy for each other.
| bottled_poe wrote:
| Why are software devs responsible for first level support?
| Software devs are expensive, support staff are not. Seems like
| bad business to me.
| spinningslate wrote:
| Big fat "it depends" on that. It might be superficially
| correct (dependong on the scope/skills of "support" staff).
| Even if there is a meaningful financial difference between
| the day rates, it doesn't necessarily follow when the process
| is viewed end to end:
|
| 1. If ops staff have limited expertise/authority, it's less
| likely they can resolve problems. They might acknowledge (so
| maintaining some aspect of client SLA), or have a limited set
| of pre-defined remedial actions (reset button). Anything
| beyond that, though, and it needs the dev team. So it's
| arguable whether the ops staff provide much value in the
| equation.
|
| 2. As a dev, there's nothing quite like the prospect of being
| paged at 2am on a Sunday to incentive more robust code.
|
| End to end dev accountability isn't a panacea either - but
| the problem is more nuanced than just pay rates.
| kazen44 wrote:
| also, devs seem to really underestimate the pay support
| staff makes, especially those capable of troubleshooting
| deep, low-level and complex issues. this might be less
| visible on the Dev side, but troubleshooting infrastructure
| is not an easy to find skill, especially if dozens of
| moving parts are involved.
| LilBytes wrote:
| More power to them. The rest of us 'Ops' guys are slowly
| transitioning to DevOps Engineers and SRE's and are
| gladly taking handfuls of cash because I both know how to
| read hex dumps, and are also smart enough to know your IP
| address isn't going to change because I replaced your
| ethernet cable.
| acdha wrote:
| One of the best arguments for shared responsibility is that
| it avoids "not my job" thinking. I've seen large
| organizations burn resources and downtime because
| developers and ops are in an adversarial relationship where
| nobody has a stronger sense of responsibility for an
| application working than they do for shifting the blame to
| their counterparts. If everyone is getting things escalated
| to them it tends to cut through that cycle.
| 908B64B197 wrote:
| > This is 100% a problem with younger, bootcamp-"educated"
| devs, in my experience.
|
| You'll get a lot of pushback here but it's definitely true.
|
| That doesn't mean it doesn't happen with CS grads as well, but
| it's quite rampant among bootcamp devs. I think the reason for
| that is that, since the bootcamps are so short, they "stay on
| rails" and mostly work on simple projects (that will give out
| something they can push to a github repo and use as a
| portfolio).
|
| It's the same with git. Every bootcamp will use git and claim
| to teach it to their grads, but then watch them do anything on
| a repo with multiple users. A lot of them just rote memorized
| commands to pull and push to main and that's it. Branching?
| Rebase? Using the commit history? Never heard of.
|
| For new hires from serious Engineering or CS Degree, they
| should have had at least a few classes dedicated to projects
| where they built something non-trivial. On top of theoretical
| classes teaching the fundamentals.
| arvindamirtaa wrote:
| > This is 100% a problem with younger, bootcamp-"educated"
| devs, in my experience.
|
| I resent this. Not because I'm a bootcamp-"educated" dev. I'm
| not. But it suggests somehow that devs with CS degrees are
| somehow better in this aspect. If anything, they're arguably
| worse (obligatory, not everyone disclaimer).
| nikanj wrote:
| "We'll just rewrite that microservice from scratch and the
| error probably goes away. Preferably with the newest framework
| du jour"
| bdavis__ wrote:
| 2021, State of the Practice.
|
| well except for using Rust, and a bunch of dependencies from
| the web, version determined when downloaded at compile time.
| Macha wrote:
| To be fair, Cargo defaults to using lockfiles, so you'd
| have to go out of your way to hit both your points at the
| same time.
| bdavis__ wrote:
| very true, some hyperbole to make a point.
|
| "lack of newness" is a characteristic many will expend
| untold hours to extinguish. to my perspective, the
| "rewrite it in Rust crowd" is the peak; all non-Rust code
| is soiled, and worthy of replacement.
|
| (it is very possible that the "rewrite in Rust" movement
| is just a guerrilla marketing project)
| selfhoster11 wrote:
| I don't really care for Rust, but all non-memory safe
| code could benefit from being replaced. This does, for
| some people, mean rewriting it in Rust because C and C++
| make it harder to achieve the goal of memory safety.
| politician wrote:
| Not to totally discount your point, but consider that
| rewriting is a form of study.
| birdyrooster wrote:
| It's like script kiddies showing you all of the innocuous stuff
| they found with Google, they are illiterate and so their
| imaginations run wild.
| dijit wrote:
| The script kiddies of yore, however, did have a desire to get
| stuff working.
|
| This lead to a lot of them learning common failure modes of
| the software they ran.
|
| Ironically the people who were script kiddies in their teens
| have been some of the best troubleshooters I know.
| snovv_crash wrote:
| Selective memory, nobody cares about Igor who never got
| further than being a script kiddie.
| dkersten wrote:
| Force them to write heavily templates C++ code, they either
| learn how to read pages of comprehensible error message to
| diagnose one little typo, or they can't do their job and are
| forced to look for a new one.
|
| It's usually not hard to figure out what's wrong from the
| messages, but man do they look scary and hard to understand
| when they appear. Yes I've been writing C++17 lately using some
| very template heavy libraries.
| spaetzleesser wrote:
| "Force them to write heavily templates C++ code, they either
| learn how to read pages of comprehensible error message to
| diagnose one little typo, or they can't do their job and are
| forced to look for a new one."
|
| When I did C++ we sometimes made little competitions for the
| smallest change that can produce the craziest error messages.
| On the other hand I always found it extremely satisfying to
| make one little change that removed thousands of errors and
| warnings.
| dkersten wrote:
| I had a nice one today where it complained about some thing
| not being invokable deep in some std code somewhere. Lots
| of crazy template instantiation errors. It turned out I
| forgot to pass the variant parameter to std::visit.
| shoo wrote:
| I've avoided C++ for most of my career, but one idiotic
| mistake I do remember making was accidentally leaving an
| open curlybrace at the end of one of my source files. The
| C++ compiler ran and reported 1000s of compilation errors
| all through every single _other_ file -- in my code,
| throughout all the library code I included.
|
| Easily diagnosed if you're working incrementally, one small
| change at a time, and making checkpoints with version
| control: `git diff`, carefully review the diff of what you
| changed since the last checkpoint where things were more or
| less working. I must have not been disciplined enough to
| work like that at the time.
|
| Troubleshooting systems integration failures is also
| character building for getting better at diagnosis from
| errors. Sure, it's failing, but let's try to figure out the
| immediate layer of failure from the logs, error messages,
| symptoms: name resolution? tcp? tls? http proxy?
| authentication? authorisation? api spec misalignment? error
| in our application code or the system we're directly
| talking to? unexpected data? error in some other system
| that we depend upon transitively? each time you hit a new
| novel failure mode, or fail at one level deeper, you're
| making progress!
| dsr_ wrote:
| It was not about 27 years ago that the senior sysadmin at my
| first real job told me: "Most young people don't have any
| hacker spirit at all, they just give up the first time they see
| an error message. The world is not going to survive that."
|
| I was doubtful that this was a universal truth then, and I
| think it's the same now: there are a lot of people who do
| mediocre work, and they are and were supported by a smaller
| group of people who do really good work. And the world keeps
| turning.
|
| One of the joys of the Internet and of open source is the
| increased ease of sharing ideas and solutions.
| ByteWelder wrote:
| > they are and were supported by a smaller group of people
| who do really good work
|
| I think the only solution for this is when "support" includes
| education. In the simplest form, we can give support by
| helping that colleague to find the issue himself, rather than
| giving the solution directly. In a more advanced form, you're
| making structural changes to your company. Like in how you
| share knowledge with the team.
| datavirtue wrote:
| Reality check: they run off and nag someone else...very
| discreetly...until someone caves and fixes it for them. If
| they get enough pushback they will just quit.
| indigodaddy wrote:
| Salient point and makes a ton of sense. I think this is
| accurate, totally coincides with my entire career/work
| experience.
| blowski wrote:
| Some seniors remind me of street preachers shouting "Hell and
| damnation awaits you!", and they get much the same response
| from people. Not saying this is you, but I'd be interested to
| know what ways you've tried and what's worked or not.
| GlennS wrote:
| Keep at it: some crusty guy telling me to just read the sodding
| error 12 years ago opened my eyes.
|
| I think there's something about certain kinds of tools that are
| cryptic, unpredictable, and frustrating that can teach you to
| be helpless - to just Google and hope. It's fixable though.
| acdha wrote:
| > This is 100% a problem with younger, bootcamp-"educated"
| devs, in my experience.
|
| In my experience it's also common with older and college-
| educated ones; contractors trying to avoid extra hours; senior
| architects; and especially anyone who thinks ops is someone
| else's job. It's definitely not specific to age or training
| mode.
|
| There are a few contributing factors I see: tunnel-vision
| focused on the particular detail they think they're working on,
| causing them to ignore anything they "know" isn't related;
| shoddy tools like much of the Java ecosystem where poor culture
| around logging trains every user that it's normal to have huge
| amounts of log spew; etc. but the biggest problem I have seen
| is ego -- either unwillingness to believe that the product of
| their staggering intellect could be less than perfect or that
| the mundane task of getting their grand vision to actually work
| is for the little people.
|
| I'm thinking of a "senior architect" who was quite surprised to
| learn that networks are neither perfect nor instantaneous, and
| that his app might have some issues due to needing thousands of
| XHR calls to load the UI. It was so much easier to ignore the
| error messages and say the problem was Chrome. He had a CS
| degree - the problem was the wrong mindset and having been
| enabled to avoid good troubleshooting skills.
| BizarroLand wrote:
| It could be argued that there is a technological Maya that
| has to be overcome before you mature as a developer. Most
| people grow up spoon fed the "it just works" ideal, even old
| school computer people never really had to doubt that their
| floppy drive would read a disk or that their PC BIOS would
| accurately boot their computer.
|
| It's only after the technological illusion of Maya breaks
| that you realize floppies have read heads, hard drives have
| moving parts, CPUs have conductive traces and all of these
| are vulnerable to breakdown, entropy exists in the system and
| cannot be expelled, that the previous "ideal" state of your
| system was temporary, an illusion, that nothing always works
| the way it is supposed to and that your options boil down to
| "burn it to the ground and start over" or "leap into Hades
| both feet first to rescue the soul of what you love".
|
| Most people go the first route. Buy a new one. Replace what
| is broken with something else. That way the illusions are
| never broken. The technology didn't fail, only its current &
| easily replaceable avatar.
|
| As the Son of God once did, after its death it will rise
| again, immortally replaceable.
|
| However, it is only after you have faced that 2nd trial by
| fire and returned with your elixir that you as a changed
| being can peer through the veil. The meme about "CPUs being
| rocks we filled with lightning & tricked into thinking" rings
| differently to you now.
|
| You're touched the bones of the God and found that they
| crumble. There is no God here, only a beautiful shambling
| nightmare that has eaten the minds and souls of millions,
| built by mad scientists and engineers in a vain attempt to
| create the God whose physical absence they find themselves
| longing for the same way a neglected child longs for the
| embrace of their mother.
|
| https://psychology.wikia.org/wiki/Maya_(illusion)
| jamil7 wrote:
| > I keep having to tell junior devs to actually _read_ the fine
| error message
|
| I wonder if it's also to do with the environment in which they
| learn. When I was learning to program, like probably others
| here, I didn't have anyone around me who knew anything about
| computers so was generally on my own until my first job and had
| to dig through stack traces and read error messages and had to
| try and figure out what was wrong. Kind of a blessing and a
| curse as I imagine my rate would have been a quicker and I
| wouldn't have hit so many brick walls but I learned to debug
| independently.
| duped wrote:
| I have the opposite experience, boomer devs that see red or
| yellow in their terminal and send me an email or jira ticket
| before trying to parse it. The younger generation at least can
| make judgement calls on warnings.
| giantg2 wrote:
| I do have one anecdote/counterpoint. At my company we do
| DevOps, but we have a central group that creates the templates
| we are supposed to use for various AWS resources, build plans,
| and deploy plans. It makes it extremely frustrating to
| troubleshoot issues specific to the templates or how you
| configured them because they usually aren't something you can
| google and the documentation for them is extremely basic.
| Sometimes you have to ask the people who created them since
| they have the deep understanding and experience.
| le-mark wrote:
| I share this pain, but believe it or not, it's actually much
| better than everyone doing their own thing all willy nilly.
| giantg2 wrote:
| I agree that it has to be done to have consistency. I'm
| just pointing out that when using IaC designed by another
| team where the internal workings are largely hidden, us
| primarily dev guys are going to need some help with issues
| that appear to be ops related. (And yeah, occasionally root
| cause will be that I'm just an idiot)
| nickjj wrote:
| I've been contracting at a company for a few years and things
| progressed from manually setting up servers (before I was
| involved) to using Ansible to provision multiple servers and
| now it's transitioning to a Kubernetes cluster and
| infrastructure as code from day 1 along with tomes of
| documentation to go with it.
|
| It really is worth it to go the extra mile and write
| comprehensive docs, even going as far as writing them in a
| conversational tone as if it's a blog post or a book. I'm
| really happy I found a company who treats documentation and
| workflows as first class resources.
|
| For a small team where only 1 person is working on this it
| helps eliminate the bus factor and it also makes it easier to
| have non-hardcore ops folks do code reviews on your IaC.
| Having them be able to get the gist of it with a little bit
| of background knowledge is so much better than nothing. All
| of this results in higher reliability of the services your
| company offers.
| callesgg wrote:
| I don't know most of the times error messages on computers are
| garbage. People build up some type of fear for the error
| message format.
|
| I have an example:
|
| I built a logistics and invoicing tool a few years back with
| error messages that where human readable with clear proper
| messages that told the user exactly what they did wrong and it
| even proposed how they might fix the problem.
|
| I don't know how many times I had to go to the users
| workstations read the text out loud for them like they where a
| 5 year old and ask them what they thought it meant. They always
| knew what it meant but I had to read it for them it was
| embarrassing.
|
| And these where university educated accountants that where
| using the software.
|
| After a lifetime of garbage error messages like "error code
| 4513" people just zone out.
| BizarroLand wrote:
| For real. How difficult would it be to have the computer tell
| us what the error code means instead of a single sentence.
|
| I get codes back in the day when storage for a whole book was
| costly, but that isn't the case anymore. Just tell us the
| error, show us the pointers, and then tell us what typical
| fixes are instead of expecting us to go to the internet for a
| solution.
| blacksmith_tb wrote:
| My take is that cryptic error messages have always been a
| cross between 'protecting company secrets' and 'never
| really admit to a mistake'. Sure, MS or Apple could just
| throw up a dialog that said "Sorry, we trashed the file you
| were working on, here are the last 1024 chars" but people
| would actually be more angry then, instead of just "Error
| 1234 occurred" (or at least they'd have to go look it up to
| be mad).
|
| As developers, we also have to be used to a lot of
| completely unhelpful errors. Yeah, couldn't connect to the
| DB, sure... oh, but actually because my code ate all of
| memory, why didn't you say that in the first place?
| ziml77 wrote:
| I have never met these kinds of people and it boggles my mind
| that they exist. Actually reading error messages is the most
| basic thing I'd expect from a developer considering that you're
| going to encounter them regularly during the development
| process.
| iovrthoughtthis wrote:
| this feels like one of those things that needs to be tackled
| from both ends
|
| teach people to read error messages and simultaneously improve
| the readability (and utility) or error messages
|
| i don't know why we put up with such bad error messages
| anymore. i imagine it's a function of stockholm syndrome and
| the difficulty in getting messages changed
| zaphar wrote:
| I sort of blame exceptions here. They make it really easy to
| just let the error bubble up to the top. But often times the
| place where the error gets thrown doesn't have all the
| necessary context to have a really good error message. If you
| want a good readable error message you have to trap the
| exception at the appropriate place and then wrap it with the
| appropriate amount of context in the message. But the easy
| path is to pretend there is no error and let the very top
| layer surface anything that went wrong to the user.
| tompazourek wrote:
| I have experienced this same issue many times. But from my
| experience, I cannot simply pinpoint it to young developers or
| their education. I have seen this behavior with several older
| people, university educated, with many years of professional
| experience. So not just an age/generational thing, in my
| opinion.
|
| Maybe they have seen so many red herrings that they don't even
| trust that the error message could contain something useful and
| relevant? Or maybe they just learned to skim through
| everything, and don't actually read stuff.
| vladvasiliu wrote:
| I have the same experience.
|
| I also don't understand why, when they ask for help, they can
| never be bothered to say what they're trying to do, what
| error message they got, etc. It feels like they're doing me a
| favor when I try to help them fix something.
| politician wrote:
| Then there's the StackOverflow effect: where you ask a
| reasonable question X and get a bunch of upvoted
| condemnations along with directions to do Y instead.
| waylandsmithers wrote:
| Right. When I was learning to code a more seasoned dev told
| me "The code doesn't do what you want it to do, only what
| you _tell_ it to do."
|
| Really helped me gain the mindset that not only was it my
| mistake that resulted in code not running, but that it was
| fixable. Like a game of ping pong. You hit the ball,
| sometimes the compiler hits it back.
| cbushko wrote:
| I think it comes down to passion and curiosity. Those are two
| things can happen at any age and education. It is also
| something that ebs and flows based on energy levels.
| Noujin wrote:
| I think the same. It's so typical everyone here tries to
| pin characteristics to exact professional groups. The
| amount of anecdotal evidence here is too damn high.
| mrtksn wrote:
| I think people don't read error messages if fixing the issue is
| beyond their immediate capabilities.
|
| Computers are scary things that fail in counterintuitive ways.
| When the handle of your tea cup breaks, the issue is intuitive
| and most people will be able to understand why it is happening
| and how to work around it(handle it carefully from the top end
| end enjoy your tea?).
|
| But when it comes to computers, often you need deep
| understanding of its inner workings to make sense of your
| observations of problems. Why Xcode would say that it failed to
| compile my project because usefulExtensions.swift already
| exists? What it is supposed to mean, I see only one file with
| that name? That information gives intuitive idea about the
| issue only if you know how the compiling process works.
|
| Why would I know why the package couldn't be found? Unless of
| course I know how that package manager works. Then I can check
| if the package manager is configured to look at the correct
| places.
|
| Most error messages are like that. Instantly makes intuitive
| sense if you know how everything is glued together and makes no
| sense and needs study if it's outside of you domain of
| expertise. No one reads error messages unless they can
| recognise the pattern instantly and there's a data(like the
| name of the variable) guiding you to the fix.
| zaphar wrote:
| First of all I agree with you. But I would like to note one
| aspect of this that has always been true. Many times the
| error message itself is terrible. They say that an error
| happened and then they give you absolutely no context for the
| error. My favorite example is when an application happily
| bubbles up the error from the operating system when a File
| read/write operation fails. The OS will tell what kind of
| operation failed. It won't tell you which file you were
| trying to read or write. It won't tell you what location you
| were trying to read or write. Basically none of the context
| you need to really understand what went wrong. You'll have to
| go read the code in order to mentally reconstruct what the
| context is. It's silly. If you are writing a file and don't
| at a minimum catch and then wrap the error with an error of
| your own that adds the necessary context then you are
| contributing to the problem here. And that's just one common
| example. There are many more.
| sgarland wrote:
| I'm an SRE, so my programs are generally Python scripts <
| 1K LOC; maybe this isn't scalable, but I write verbose log
| statements (if it's launched with --verbose, of course).
| It's not that much effort to change `except OSError as e:
| log.error(e)` to `except OSError as e: log.error(f"Error
| accessing {file} - {e}")`
|
| If I know typical causes of errors (forgot to connect to
| the VPN, etc.), I'll include them in the log message as
| well as things to check.
| zaphar wrote:
| That might at least in part be because you are an SRE and
| at some point you hit your limit of inscrutable error
| messages happening in production.
| spaetzleesser wrote:
| "I think people don't read error messages if fixing the issue
| is beyond their immediate capabilities."
|
| At least capture them so somebody who knows that area can
| make sense of them.
| Lutger wrote:
| This is a large and ongoing part of becoming a developer. It
| also happens again every time you try to learn a sufficiently
| new or alien technology. You know you start to make progress
| when the error messages begin to make sense.
|
| Often you need to know a lot of context before you're even
| able to determine what the error message is! One error
| message can lead to a cascade of other error messages, or
| it's something breaking down as a result of multiple layers
| of indirection, requiring the developer to careful track the
| trail of what went wrong and led to another thing failing,
| which broke down the next thing and ultimately, decided to
| stop the program and mention only the very last thing falling
| apart to the user. There might be a directly sensible
| connection with the original error, but often it's quite
| unrelated. An experienced developer often immediately
| recognizes: this is not the actual error message, that other
| thing is! But for a junior it's all equally incomprehensible.
|
| It is detective work with many false leads, and being very
| new at something it can be so overwhelming you don't know
| where to begin and immediately assume you will not succeed
| finding out 'whodunnit', asking your senior co-worker for
| help.
| nanis wrote:
| > Computers are scary things that fail in counterintuitive
| ways.
|
| Not being scared of the tool and believing one's inherent
| supremacy over it must be the most basic criterion for
| practicing this craft, but these days this fear is nursed, at
| times encouraged, at times even exalted (corollary of the
| failure fetish) especially by those who publicly place
| themselves as ambassadors.
|
| Any introduction to computers _must_ start with the statement
| that they are all heaps of plastic and sand and the only
| things they are able to do are because some mortal sat down
| and spent time figuring it out.
|
| People starting out now are at a disadvantage because their
| first encounters happen mostly through extremely polished
| looking apps and it is hard to see at the outset how one
| could go from weird incantations in a text editor to _that_.
| rjknight wrote:
| I think it's harder now than it used to be. When I started
| programming, any error message you saw would almost
| certainly originate from the beige box on the desk in front
| of you, which had only a single CPU core and often didn't
| really do multi-tasking. Over time, our computers,
| operating systems, applications and networks have become
| vastly more complex, with the effect that you can't easily
| build an intuition about which component is responsible for
| a failure and why.
|
| Personally, I _love_ debugging things. I have a very good
| "theory of mind" for dealing with computer failures, and
| figuring out why the computer isn't doing what I might
| naively expect it to do is a lot of fun. However, it's only
| fun because I've been able to stay on top of the curve as
| the systems I work with have become more complex. Starting
| from zero today sounds a lot more daunting.
| jimbokun wrote:
| It gets a lot less fun when there's a customer visible
| outage and you need to know why things are failing NOW
| because dollars are on the line.
| mtVessel wrote:
| True, the stacks are more complex today, but the
| resources are greater. Back in the days when the error
| was in the beige box, if the books/CDs/DVDs you had on
| your shelf didn't address the exact problem, you had to
| roll up your sleeves or you were SOL.
|
| Nowadays, research skills are more important, but I see a
| lot of devs who just don't have them. Can't find the
| answer on the first page of your first (poorly formed)
| search? Run get the senior dev. To me it reads like
| incuriousity and laziness, or lack of training.
|
| I don't mind doing some coaching, but if you're a dev,
| and you can't even be bothered to read the error message,
| what does that say about your effectiveness?
|
| /rant
| colonelpopcorn wrote:
| This is the correct comment.
| mrtksn wrote:
| > Any introduction to computers must start with the
| statement that they are all heaps of plastic
|
| This scales to everything IMHO, everything is simple once
| you understand it. Levels of abstractions is what makes it
| scary and complex. I.e. electricity or fire is also not
| scary once you know how to handle it.
| cestith wrote:
| One should not have an unreasonable fear of fire, but
| "unreasonable" is a quite key word there. One should have
| a healthy respect for it and feel some fear if those
| around you don't.
| burnished wrote:
| My own perspective on this was that it felt like my brain
| would turn to goop when I got an error message, my eyes would
| cross and I would start frantically googling literally
| anything, skimming stack overflow, and getting nothing done.
| In order to progress I had to learn to slow down and start
| reading error messages and learning what they meant.
| Sometimes this meant I had to look up a bunch of words, one
| after another, to understand something incredibly dense.
|
| So yeah, I think I largely agree with your assessment, and
| would only go on to state that the path forward is slowing
| down to learn vocabulary and think critically. You really
| speed up after that.
| Scarblac wrote:
| > But when it comes to computers, often you need deep
| understanding of its inner workings to make sense of your
| observations of problems.
|
| They're supposed to have that knowledge, or at least not be
| afraid to dive in and get that knowledge.
|
| There's only one way to build an intuition of what kind of
| problem probably causes some error (most famously, if the
| error is completely incomprehensible, you missed a closing
| thingy on the previous line), and that's by doing the work a
| lot.
| politician wrote:
| Unfortunately, the way we've organized this industry, a
| junior developer stuck on an Agile(TM) hamster wheel has no
| time to dive in and figure it out.
| datavirtue wrote:
| Very very true. When something causes a 1 point story to
| take three days; bring on the hacks and compromises and
| ignore anything that doesn't need dealt with to get it
| out the door.
| mrtksn wrote:
| > They're supposed to have that knowledge
|
| I don't think so. We can do so many amazing things with the
| computers precisely because we don't have to know how
| things work. Computers are so many levels of abstractions
| over printed metal on melted sand.
|
| People who know what they are doing will understand the
| errors of their own creations and will learn the workings
| of the tools they use to some degrees and will be able to
| understand the failing modes of these tools with experience
| over time. No one starts with complete knowledge before
| start building things.
|
| > or at least not be afraid to dive in and get that
| knowledge.
|
| Of course they should have the drive but people's first
| instinct would be to make the error go away so that they
| can do their actual work. People have limited time and
| energy, you can't expect a JS developer, for example, to
| study inner workings of a Linux box to understand all
| errors. It's cool when they do and gives them superpowers
| but it also makes them less productive as JS developers.
| Sometimes you simply need to implement that button to
| render on the server without studying the server.
| rudasn wrote:
| > I think people don't read error messages if fixing the
| issue is beyond their immediate capabilities.
|
| How would they obtain those abilities though if not while
| spending time on the issues brought up and learning how to
| learn.
|
| I think sometimes people are just bored and can't be bothered
| to find the cause and solution to their issues, and over a
| long period of time that mentality sticks and becomes second
| nature resulting in phrases like "this software sucks, I need
| to read the docs to use it".
| mrtksn wrote:
| Sure, if they study it then they learn it.
|
| The problem is, learning is taxing and many times you
| encounter these errors when you have more important things
| to do.
|
| When you want to develop your game and the IDE is
| complaining about something about locating some files, do
| you think that it is good idea to learn how that IDE
| organises dependencies?
|
| Sometimes you suck it up and learn it and you know next
| time. However, your first instinct would be to look for
| ways to make the error go away so that you can immediately
| start working on the task that you are supposed to work on.
| That's why we have abstractions and when things work fine
| we don't know how things work.
|
| It shouldn't be expected of you having complete knowledge
| of all computer systems, tools and frameworks before you
| can make a ball image bounce on the screen.
| datavirtue wrote:
| Taxing as hell. I have personal projects well underway
| that go unfinished because of tooling complexity or some
| other issue causing me to completely derail and spend
| days figuring out some type of in issue that has
| absolutely nothing to do with what I'm trying to
| accomplish. Granted, since I have gotten away from visual
| studio it is much better but I'm t still happens. If it
| isn't the IDE or package upgrades it's AWS or Azure
| issues.
| Aeolun wrote:
| But the moment you _get_ these error messages they have to
| become your domain of study.
| cies wrote:
| To my the line between dev and ops is where we put it. And I like
| to make it very explicit.
|
| In shared hosting times. Ops maintained Puppet definitions,
| create new "deploy environments" (using Puppet), gave/revoked
| server access the employees, monitored servers. Devs maintained
| the source repos and deployed to the environments provided by
| ops.
|
| Now we live in virtual machine times (docker). Ops does the cloud
| infra (terraform), monitors the services, gives/revokes access to
| cloud services. Devs maintaines the source repos and deploys to
| the cloud clusters provided by ops.
| Jenk wrote:
| This blog perfectly articulates the strife that inspired and
| drove the DevOps trend.
|
| I am always saddened when I hear "our organisation has a DevOps
| team" - immediately this demonstrates the fundamenetal lack of
| understanding the very premise of what DevOps set out to solve:
| Bringing Development and Operations _together_.
|
| Even the very name "DevOps" was constructed such to symbolise the
| combining of the two domains into one. But no. Now we just have a
| new cool title to throw on people who will be ringfenced just as
| they were before.
| dogleash wrote:
| I'm more and more convinced that "DevOps" (and "Agile" before
| it) are just buzzwords that can be leveraged to make whatever
| change the person implementing it wanted to do all along, with
| zero regard for what the buzzword actually means. If real
| devops was going to happen, it wouldn't take a brand to sell
| cross functional collaboration. It would have just been yet
| another one of the constant stream of incremental improvements
| we make by folding lessons learned in industry into our own
| orgs.
|
| "DevOps" today is just codified Shadow IT.
| mrweasel wrote:
| > Bringing Development and Operations together
|
| Yep, because developer might not know what ops needs in terms
| of traceability, logs and so on, to be able to run their code
| in production without having to wake them up at 2AM. Similarly
| Ops knows a lot about what can be done with existing
| infrastructure, or off the shelf components, which can save a
| huge amount of work, while providing a more stable system.
|
| I do mostly operations now, and I'm lucky enough to work with
| really talents developers, who care to listen to input, before
| writing 5000 lines of code. I also work with customers, who
| have their own developers, with their own weird ideas about the
| world.
|
| The biggest problem I see right now, except for occasions
| cowboy pretending to be a professional developer, is developer
| picking technologies without understanding it. We work with
| customer who picked technologies because they're interesting,
| not because it's what they need. When performance is terrible
| it becomes and operations issue and being told "Kafka is not
| actually a database and should be used as one" often isn't the
| answer they want. Or try telling a developer that the code he
| worked on for three months can be done by the existing load
| balancer in a few hours or that the ORM is actually writing
| terrible queries.
|
| DevOps team, as in: "We use the shared knowledge of both
| parties" is fantastic, but operations is frequently an
| afterthought and not involved in the design fase.
|
| If we're to take "DevOps" as developers doing operation, I'd
| prefer that we do the opposite and let operations do
| development. I think we'd get better results.
| criticas wrote:
| Another root cause in our environment is alluded to in the
| article. With the rise of test frameworks, devs seem to test to
| prove the API is correct, not to find problems.
|
| Another symptom of this is that when the QA/Staging function went
| away, load testing became perfunctory. Many of the performance
| problems we see should have been caught in QA. Devs are anxious
| to ship and get on to the next sprint, leaving app support and
| operations on the hook.
| nijave wrote:
| >Devs are anxious to ship and get on to the next sprint
|
| I think it goes a step farther back to product. PMs and
| analysts put constant pressure on developer teams to complete
| work quickly and that time pressure shows up on the next guy's
| plate, etc
|
| "Trickle down software engineering"
| forinti wrote:
| A lot of the animosity between teams come from the fact that IT
| departments are being pushed harder and harder.
|
| Programmers have to push out an endless stream of features; DBAs
| have to deal with ever greater amounts of data; network people
| have to deal with an enormous amount of endpoints (and now the
| network extends itself beyond the firm, so security concerns have
| grown exponentially).
|
| The real challenge is to make your IT departments realise that
| they are not each other's obstacles.
| relax88 wrote:
| The problem is often political. I have been at many orgs where
| the management of the dev team over promises something to the
| executive, and then when ops finds out about it and realizes
| the project is going to be a giant dumpster fire whose failure
| they will likely be blamed for, it becomes really hard for
| people to foster a "we're on the same team" mindset.
|
| As with most organizational dysfunction, middle management
| fiefdoms are to blame.
|
| It always helps when the executive can see through this
| bullshit and ask the right questions, but often by the time
| this happens millions of dollars have been wasted.
| diNgUrAndI wrote:
| Learning new abstractions is not a good argument though. Did
| hardware engineers from the pre-OS era complain about Linux / OS
| hard to understand?
|
| In a sense, kubernetes is the new Linux / Bash of our time.
|
| If it's painful, maybe it's just the abstraction not done right,
| but not the fault of abstraction itself.
| hosh wrote:
| I have done both application development and ops. In my current
| job, I am doing both.
|
| There is a big difference between the mindset of what makes a
| good application developer and what makes a good ops person.
|
| Application developers, by and large, have a sort of "sandbox"
| within which features are developed. This sandbox results from
| working with abstractions, each with some kind of guarantee. For
| example, most application developers assume that what you write
| into memory will be what you get out. That is, hardware is
| abstracted. The idea that the memory chips themselves can have
| defects or can sometimes fail, even if one gets error-correcting
| memory chips, is a violation of guarantees. Memory that just
| works is taken for granted. Another example is assuming that the
| system clock is monotonic.
|
| This extends to things like networks, storage, operating
| characteristics, and so forth. Very few application developers
| get into that nitty gritty, let along all the plumbing and
| interactions among different systems.
|
| I've seen application developers get incredibly frustrated and
| angry when those underlying guarantees are violated in some ways.
| I've been like that when I put on my application developer "hat".
| The main reason is that the developer is holding as much of the
| state and logic as they can, and they do this by excluding things
| through abstractions. They want their tooling and platform to
| just work so they can focus on writing good software.
|
| The thing is that, for a good ops person, all those nitty gritty
| and plumbing _is_ the focus of their jobs. It takes a very
| different mindset to troubleshoot: you start looking at those
| "guarantees" and find out what they are actually doing.
|
| I once interviewed at a place which has this amazing way of
| figuring out if someone has the mindset and tenacity to be a good
| ops person. It was not writing algorithims on a white board. It
| was a deceptively simple task of installing a piece of software.
| And even though there are documentation for installing that
| software, there are not documentation for installing that
| software for every single environment and requirements and its
| interaction with other systems. When adding in a time crunch, and
| the scrutiny of an observer, that simulates a pretty typical day
| pretty well. You have to have enough emotional intelligence to
| keep working through it until it works. Documentation is always
| sparse and can't be guaranteed to be correct. Runbook? Good idea,
| but there is no way even meticulously crafted runbook for one
| component is going to be able to describe how systems interact
| with each other. Someone, somewhere has to figure that out.
| (Well, they don't have to. We can just let the system fail).
|
| And sometimes, as an ops person, you have to open that "black
| box" and read code. Just like sometimes, an application developer
| needs to pop open the abstraction layer and pull out netcat or
| sysdig.
|
| In the end, I'm not lamenting that DevOps blurs who owns what.
| Maybe this is because I've mostly worked on small, early-stage
| startup teams. Complexity has to live somewhere. I like working
| on the teams where people talk to each other to figure things
| out.
| cbushko wrote:
| I am in Ops and I can understand this point of view. The author
| is probably overwhelmed with issues that are 'not his problem'
| and this is his rant on it.
|
| To be fair, Developers are getting slammed with their
| responsibilities too. At one time it used to be that they could
| just know one programming language really well, like java,
| compile their code and hand it off to QA.
|
| Now they have to know a dozen languages, frameworks, do their own
| testing, deploy the service, monitor it and trouble shoot
| everything in production in some 'cloud'.
|
| Or they are just being lazy and this guy is sick of it. That is
| when you do your best to train people up and get them to put in
| the leg work. Ask pointed questions about if they Googled the
| error and help them work through the problem. Then add some
| things to the docs to help others out in the future.
| bsedlm wrote:
| I've seen a larger trend of developers (modern app programmers)
| not knowing how to use (in a deep advanced way) their own
| computers.
|
| I guess these developers will end up writing code directly on
| github online editors...
| _wldu wrote:
| Years ago, everyone working in tech was an IT generalist. They
| did everything (DB design, systems, applications, algorithms,
| code, networking, etc.). Today, the field has matured and people
| are able to specialize.
|
| Sometimes, when old IT generalists work with new IT specialists,
| these sort of misunderstandings occur.
| WJW wrote:
| Years before everyone was apparently a generalist, you would
| have a separate DBA, a separate architect, sometimes even
| separate teams for implementing algorithms, etc. The mythical
| man month has a very nice section on splitting up the work over
| the various teams and that's a book written in the early 70s.
|
| I think the actual boundary lies more in big vs small
| companies: small companies do not have the resources to hire
| specialists for every little subproblem, while big companies
| typically have enough employees that specialisation becomes a
| possibility.
| makach wrote:
| it is called full-stack today?
| _wldu wrote:
| I'm not sure. I take full-stack to mean the front-end,
| middleware and back-end of a webapp. I would still call it
| being a "generalist" when applying the concept to computer
| technology in general. For example, the CTO of an org should
| be a technology generalist, not a full-stack web dev.
|
| Of course, this is just my personal opinion based on what I
| have experienced over the last 40 years.
| k__ wrote:
| It should be.
|
| But in the cruft of legacy systems it probably won't be for a
| long time, probably never.
|
| I'm a freelancer. I "use" the project managers of my clients, I
| don't hire them myself.
|
| Same goes for my applications. I use the cloud, managed services
| and such. The providers hire operations people, I don't.
| rsyring wrote:
| I think some of the article is framed wrong. It's written as
| competent ops guy/team vs. incompetent developers. But I don't
| think that's actually what's going on. I'm sure there are
| competent developers saying the exact same thing about their
| relatively incompetent ops people.
|
| We built a couple relatively simple applications for an
| enterprise client. It took their ops teams months to get both
| applications running in K8s, even though our deliverable was a
| fully functioning container. They were largely incompetent as far
| as we could tell.
|
| But, I don't think it's worth being unkind or judging them. Every
| time they asked us a question we made an effort to point them in
| the right direction. There were other times it was a problem we
| couldn't help with, we kindly let them know that.
|
| I think the reality is that the demand for competent IT and
| developers outpaced supply a long time ago and it's not getting
| better. Those of us who know and care about the difference should
| make competent co-workers and executives part of the job
| evaluation. Or, accept incompetence around you as a reality, help
| and avoid as wisdom dictates.
|
| But, complaining that it exists and framing it as competent ops
| vs incompetent developers is both untrue and unhelpful IMO.
|
| The latter part of the article that talks about the pace of
| features, complexity, and the lack of time is spot on though IMO.
| I think the article would have been better focusing here and
| avoiding the IT vs devs angle.
| lowercased wrote:
| I started to write something, but you encapsulated it much more
| concisely. I've been on all sides of this over the past... 25
| years, mostly dev, sometimes having to handle
| server/network/etc (before it was 'devops'), and worked on
| large and small teams.
|
| Competent and incompetent people exist in all areas. Some of
| those incompetent ones can get better with time and support,
| and some can't/don't.
| mankypro wrote:
| Cannot even count the number of times that my Ops teams had to
| write wrappers and o scripts to ameliorate issues suffered by
| apps rushed to production by shoddy dev teams. glad someone has
| written about it. Anyone in the business knows this has been
| going on forever.
|
| The problem is the Ops teams get ZERO credit for enabling the
| shoddy work done by devs, the devs meanwhile get patted on the
| back, and frankly continue to be romanticized.
|
| "move fast and break shit (and let Ops fix it silently)"
| g051051 wrote:
| 100% agreement, but from the developer side. DevOps has been
| nothing short of a disaster for software development, on par with
| Agile.
| cbushko wrote:
| I am sincerely curious why you feel this way about DevOps.
| g051051 wrote:
| Because it treats Dev and Ops as having identical, equivalent
| skill sets. Attempting to make this true invariably leads to
| disaster. In over 30 years of professional software
| development, I've never seen the problems it purports to
| solve.
| cbushko wrote:
| They are similar skill sets but the domains are different.
| I am not expected to understand every javascript framework
| that is thrown at developers and they are not expected to
| know the inner workings of our networking, kubernetes or
| service meshes. I will concede that every company wanting
| an Ops person to be a full fledged software developer is a
| little ridiculous.
|
| I have been at this for 25+ years and I remember the days
| of silos. I remember developers passing off code to QA and
| it coming back days later with bugs. I also remember a lot
| of 'not my problem' coming back from developers. Sometimes
| it wasn't their problem; often it was.
|
| Either way, the person with intimate knowledge of how the
| code works should be the first person that looks at the
| problem. In SaaS, that is production and should be the
| developers (within reason).
|
| As Ops, my goal is to make that as easy for possible for
| developers. That means automating everything I can so that
| the right tools are in place for deploying, monitoring and
| alerting are there. It means that I have to automate
| spinning up and destroying infrastructure as quick and easy
| as possible so that we can meet your needs and also keep
| costs down.
|
| I have also seen companies fail at becoming 'devops' in the
| most terrible way. They took developers and made them own
| everything from code to deployment to VMs. The developers
| had so many pieces to understand that the only guarantee
| was failure. That was a terrible startup to work at.
| g051051 wrote:
| > They are similar skill sets but the domains are
| different.
|
| Exactly. They're specializations, like heart surgeon vs.
| orthopedic surgeon.
|
| > I have been at this for 25+ years and I remember the
| days of silos.
|
| 32 years for me. Silos evolved out of the wild west of
| the 90's and early 2000's. Which evolved out of the
| strict controls of early computers run by a cult of
| Operators where the devs couldn't even access the machine
| directly. It's a cycle, where management tries to remove
| people, only to have to put them back later. I've seen it
| over and over.
|
| > I remember developers passing off code to QA and it
| coming back days later with bugs.
|
| It is literally QA's job to find bugs that developers
| missed.
|
| > In SaaS, that is production and should be the
| developers
|
| SaaS or not doesn't have anything to do with it.
|
| > As Ops, my goal is to make that as easy for possible
| for developers.
|
| As a developer, my goal is to deliver high quality code
| that meets the requirements for performance, stability,
| monitoring, security, and functionality.
|
| > They took developers and made them own everything from
| code to deployment to VMs. The developers had so many
| pieces to understand that the only guarantee was failure.
|
| I've never seen DevOps done any other way. Hence my
| original comment.
| UK-Al05 wrote:
| Large companies force devs to go through ops for so many things.
| There isn't much of a choice.
| Aeolun wrote:
| I disagree with pretty much everything in this article, apart
| from the fact that people often come up to me with a question of
| the "I tried nothing, and I'm all out of ideas!" kind.
|
| The author seems to want it both ways. They want the devs to fix
| their own problems, but at the same time give them zero control
| of the stack (we have to provide them with guide rails to prevent
| them from hurting themselves indeed).
| prepend wrote:
| I've found ESR's "how to ask questions the smart way" [0] to be
| really helpful in these situations on both the asking and
| answering.
|
| If I'm asking a question I explain what I'm trying to figure out,
| what I've tried, what I expect, what I've researched. Basically
| helping the answerer not waste as much time covering the same
| ground.
|
| If I'm answering questions and don't get this info, I ask it. And
| establish the expectation that this info helps me answer their
| question.
|
| About 70% of the time, the asker adds in more info. 25% of the
| time I don't hear back. 5% of the time I get a complaint that
| they are too busy or can't answer the questions.
|
| [0] http://www.catb.org/~esr/faqs/smart-questions.html
| pietromenna wrote:
| To the author of this article: Really great job on it, had great
| fun reading and also lots of truths in there. But this part:
|
| "Often they have not even bothered to do basic troubleshooting,
| things like read the documentation on what the error message is
| attempting to tell you."
|
| This happens, but this just means that your Development Team
| needs some coaching or to improve their quality.
|
| This tells more of a quality of the development team you have
| been working with. You have to pass along this feedback and
| ensure that Development team also works with professionalism as
| everybody else.
|
| DevOps would tell be that "Dev & Ops" would look up issues
| together (Yes, he will be blocked as well WORKING with you), if
| you find that it was developer's fault. Tell them: "Hey, this is
| on your side. You saw how we troubleshooted together. Now each of
| us has new tricks to use in the future".
|
| If you don't do that, you are the shortest path to get THEIR
| problem solved. And it is too easy to go that path.
| brador wrote:
| On mobile this webpage has a thin hovering black bar at the top
| that fills to the right as you scroll further into the article.
| Very nice feature that I have not seen before.
| nonameiguess wrote:
| Not just mobile. It has the same progress bar in a full-size
| browser as well.
| dsr_ wrote:
| We used to have this browser-supplied thing which would tell
| you how far you were in the document, what percentage of the
| document you were currently looking at, and afforded you the
| ability to quickly change your position.
|
| It was called a scrollbar.
| nikau wrote:
| You and your dinosaur technology, next thing you will want
| clickable interactive text to be in a different font or
| colour to differentiate from regular text.
| Waterluvian wrote:
| Everyone needs to be a bit of everything to mitigate the cases
| where one team doesn't understand another team's domain and
| Applications begins blaming IT or Operations admits to not
| understanding the applications they facilitate.
| zenron wrote:
| I mostly agree with the overall tone of the article but I do have
| to point something out:
|
| > It is baffling on many levels to me. First, I am not an
| application developer and never have been. I enjoy writing code,
| mostly scripting in Python, as a way to reliably solve problems
| in my own field. I have very little context on what your
| application may even do, as I deal with many application demands
| every week. I'm not in your retros or part of your sprint
| planning. I likely don't even know what "working" means in the
| context of your app.
|
| The point about not being in retros or part of sprint planning...
| I take up arms against that. I've worked for companies that have
| gone from waterfall to hybrid agile because we cannot get buy in
| from Ops to actually... you know... come to our retros, sprint
| planning and scrums.
|
| Some things in this article is just pointing out the obvious...
| mediocre developers who push their problems and/or lacks on other
| teams. However, that quote the Author needs to look in the
| mirror. They exist only because of the products offered by the
| Company need resources. They have a responsibility to be business
| partners in that. If they aren't the company needs to re-align
| some priorities and it could start with Ops. Ops doesn't get a
| pass in an agile organization. The whole point of agile is to
| destroy them ivory towers. And if they were in those planning
| sessions, the developer might have already gone over the type of
| destructive testing that would have emerged from that
| collaboration and their DevOps relationship would be even richer.
| dogman144 wrote:
| There is a reason SRE/DeVOps Eng jobs are taking off in number
| and comp, and entities GitHub is (slowly) figuring out how to
| automate dev work.
|
| Running code at scale turned into a very challenging comp sci
| program, and uptime vs code slickness is getting prioritized by
| clients.
|
| The career support and innovation in that corner of the world
| (ops eng jobs) reflects it. Sort of gets after what software
| architects do, but the requirements to know that come way earlier
| in the career for Ops. Ops Engs with cloud knowledge, Python, and
| IaC tend to go far.
| gfiorav wrote:
| Add to the list: "Running code generated by ML which is not
| trusted"
|
| Similar in nature to "Running arbitrary containers" but without
| the human trust-to-do-no-evil policy in place.
| eljimmy wrote:
| This isn't just specific to operations, I experience this amongst
| other developer teams as well.
|
| I've had previous coworkers approach me about API "bugs" because
| they didn't bother to troubleshoot their app code and just
| immediately assumed it was a server-side issue.
|
| Then I spend 10 minutes debugging the issue only to point them
| the error in their own code. I don't know if it's laziness or
| inability to troubleshoot, or both.
| exdsq wrote:
| If your QA team is a "thing" that gets features at the end of a
| sprint and churns out bugs or releases you're doing it wrong.
| They should be involved on a feature by feature basis working
| alongside the developer with QA time incorporated into every
| task. All unit/integration/system tests should be automated
| during the cycle so there is no "hand-off to QA". There should be
| less latency because you have a test expert speeding up
| implementing tests or being a force multiplier to developers by
| acting as an internal consultant who can advise on bits where
| needed.
|
| QA as a discipline has evolved but from the sounds of it, it's
| not been widespread enough.
| criticas wrote:
| This happens weekly.
|
| Developer: Host XYZ is very busy.
|
| Sysadmin: Yes, Yes it is. The top 10 processes are your Java App.
|
| Developer: Fix it.
|
| Sysadmin: ???? You can request a larger virtual machine, you can
| try these options to the JVM, or you can fix your code.
|
| Developer: Can you do it?
| mrweasel wrote:
| That's oddly familiar... I frequently get: The server is slow.
| Well, no it's not really doing anything, but your applications
| is responding remarkably slowly.
|
| Or: Can I get a bigger server... Yes, but you have 32 cores and
| 256GB of RAM, and your applications isn't that complex.
| PeterisP wrote:
| Perhaps all this description is showing is that in many
| organizations there simply is a genuine need for a "Developer IT"
| support function with appropriate skills and resources, and
| because there isn't one, it's being done haphazardly by teams who
| aren't a good fit for it, as the author describes. If there's _a
| few_ niche issues then that 's solvable by e.g. dev training, but
| if the issues are _systematic_ as the article asserts, then that
| 's an organizational problem that needs an organizational
| solution. If your company can't ensure that devs are capable
| and/or motivated to troubleshoot issues that work on their laptop
| but don't in a real deployment, then your company needs some
| "internal consultations" mechanism to connect them to someone who
| does have this capability and can explain and/or fix the issues
| for them.
|
| Responding to "Someone will always have to own that gap and
| nobody wants to, because there is no incentive to. Who wants to
| own the outage, the fuck up or the slow down?" with "Not me." is
| not sufficient, it's a very valid question for which any
| organization definitely needs an answer pointing at some specific
| people - if it's not going to be pure ops people, it's IMHO not
| going to be the feature-developing devs as well, that would
| likely need separate 'site reliability engineer' teams as some
| major companies do.
| stayfrosty420 wrote:
| I disagree, it seems laughable that devs are coming with him
| with those kinds of questions.
| tilolebo wrote:
| They shouldn't have to come to him that often, if they had
| skilled senior SWE mentoring them.
|
| Seems like OP works for a shitty company.
| twic wrote:
| I agree that something needs to change at the organisation
| level in your case, but i think it's hiring and promotion. This
| "developer IT" stuff is part of a developer's job. Juniors
| won't join you knowing it all, but they can learn it from
| seniors on their team who do. If you are recruiting seniors who
| don't know this stuff, stop, and if you are promoting juniors
| to senior before they've learned this stuff, stop.
| [deleted]
| d--b wrote:
| I am sorry but as an application developer, I think this is all
| wrong. I'll thank my infra team today for not being assholes like
| this guy.
|
| 1. Application developers are your users. If we application
| developers took offense every time a user tells us that things
| are not working, we'd be pretty pissed off all the time.
| Educating and empathizing with your users is part of your job.
|
| 2. Talking about how it was better before: QA teams sure do
| buffer a lot of crap. They also cost a bunch and slow down time
| to release. Yes agile is causing problems. The bureaucracy and
| stiffness of organizations before agile was no nirvana either.
|
| 3. By your own affirmation you treat applications as black boxes
| that should be deployed using a runbook that should just work.
| This is ridiculous. Application's ownership is shared between
| everyone who works on it.
|
| 4. And yes, as developers, networking or physical drive space are
| things that we tend to abstract away. Maybe if the infrastructure
| people were involved in development discussion earlier, they'd be
| able to raise their hands and say: wait a minute, you're going to
| blow up our logs.
|
| This all feels like someone who used to not do anything suddenly
| being asked to take part in what's happening...
|
| EDIT: apologies for the strong language and sounding like an
| asshole myself, but I certainly feel irritated when someone takes
| the time to write a 5000-word article complaining about whiny
| developers who thought they could own ops but actually don't know
| anything and scream for help when they themselves are the cause
| of all evil.
| sgarland wrote:
| > And yes, as developers, networking or physical drive space
| are things that we tend to abstract away.
|
| Why is it Ops job to guess at your application requirements?
| You have the best understanding of what setting LOG_LEVEL=DEBUG
| is going to do to disk requirements.
| jimbokun wrote:
| In theory, but pragmatically it's irresponsible to assume
| without running in staging or on a subset of production
| resources to monitor and see what actually happens.
| _jal wrote:
| I currently manage an infra ops team. I was a developer for
| about 10 years.
|
| I agree with point 1, nearly completely. A lot of developers
| could take a lot more responsibility for understanding the
| environments their applications operate in, but I get it.
|
| Point 3, at least in my shop, you're just wrong. I don't know
| anything about what you're writing. I probably don't even know
| what problem it is supposed to solve. You are mistaking the
| highway road crew for mechanics.
|
| Point 4, in my shop, we provide a lot of documentation and
| guidelines for this sort of thing. Developers are responsible
| for knowing if their stuff is going to fall outside of those,
| and come to us to work something out. Again with the road
| metaphor, if you drive a semi into a single car garage, you're
| the idiot, not the person who built the garage.
|
| On some of this, I'm taking a hard line. I do, in fact, end up
| doing a lot of troubleshooting with developers. But most of my
| team does not write code. If you want more senior ops folks who
| also have a coding background, come on over! There aren't that
| many of us who are any good, and I would love to hire more.
| jameshart wrote:
| > You are mistaking the highway road crew for mechanics
|
| The highway road crew know what a car is, though, right? They
| know that the road needs to be clear and flat and drained of
| water, and the markings need to be clear, so that cars can
| drive on it.
|
| When the devs come to you complaining about flat tires, you
| can't turn round and say 'this is a mechanic issue, I don't
| know how tires are meant to work. They go on the bottom,
| right?' - you're meant to help check for rusty nails or bits
| of metal in the road that are causing all these flats.
|
| 'Oh, I didn't realize that was something that could cause
| trouble for cars'
|
| Well then you're a pretty crappy highway maintenance guy.
| [deleted]
| cogman10 wrote:
| > Point 4, in my shop, we provide a lot of documentation and
| guidelines for this sort of thing. Developers are responsible
| for knowing if their stuff is going to fall outside of those,
| and come to us to work something out. Again with the road
| metaphor, if you drive a semi into a single car garage,
| you're the idiot, not the person who built the garage.
|
| With the road metaphor, one issue I've seen is ops will
| create a rope bridge and get mad when devs need to drive a
| car over it. "You shouldn't do that! You idiot! Just walk
| over the bridge like we expect!"
|
| Example: We have about 500 different applications in our
| company and the ops team maintains a single rabbit cluster
| for all apps (and everyone is supposed to use that one
| cluster). If an app gets too chatty on that cluster "Oh you
| idiot, why are you so chatty! You just sunk the
| organization!" Which, in turn, discourages the usage of
| rabbit (maybe that's the intention?)
|
| > But most of my team does not write code.
|
| I actually prefer this ( :D ), our ops team was a bunch of
| converted devs that decided the best way to do things was
| making a giant ops framework for all devs to follow. That
| ended up costing WAY more money than if they'd just used
| tools that were available. They fetishized trying to make
| everything "just one line!" which ended up breaking anytime
| you had a slightly different need (trying to take control
| right up to managing how version bumps happen).
|
| Overly trying to force a single method of implementation has
| a lot of negative consequences. I prefer instead to have
| guidebooks and examples with the freedom to be an idiot and
| walk off the beaten path when needed.
| kcb wrote:
| It pains me. Just add this magic line to your pipeline and
| everything will "Just Work (tm)"
| kazen44 wrote:
| > With the road metaphor, one issue I've seen is ops will
| create a rope bridge and get mad when devs need to drive a
| car over it. "You shouldn't do that! You idiot! Just walk
| over the bridge like we expect!"
|
| Well, the main problem with the "bridge mismatch" is
| usually that resources required for an environment are not
| free. Its usually the opposite, most infrastructure is
| rather expensive, and running multiple systems side by side
| because multiple developers require slightly different
| versions of the same thing tends to explode cost.
| kcb wrote:
| > Point 3, at least in my shop, you're just wrong. I don't
| know anything about what you're writing. I probably don't
| even know what problem it is supposed to solve. You are
| mistaking the highway road crew for mechanics.
|
| How? Honest Question. If you know nothing of the application
| how are you able to offer any input into the infrastructure
| it runs on.
| Plasmoid wrote:
| Because an ops team will have between dozens and hundreds
| of apps to support. You do a survey of needs and build out
| something that gets to the most common use cases.
|
| You try to respond to what people need and add things when
| there is enough demand. But I can't know what your business
| goals are, what your uptime metrics are, or who your users
| are.
|
| At some point, your app becomes a black box that takes in
| requests, accesses DB/storage, and emits logs/metrics. I
| just don't have the brain space to be intimately familiar
| with each service.
| Aeolun wrote:
| > Point 3, at least in my shop, you're just wrong. I don't
| know anything about what you're writing. I probably don't
| even know what problem it is supposed to solve. You are
| mistaking the highway road crew for mechanics.
|
| I don't follow this. Developers are responsible for learning
| what kind of environment their application runs in, but ops
| is not responsible for having some clue about what they're
| running? That cuts both ways, and it'll help everyone out.
|
| > Developers are responsible for knowing if their stuff is
| going to fall outside of those, and come to us to work
| something out.
|
| I find this attitude fairly common amongst ops people. They
| just build something that is totally inappropriate for actual
| usage, and then dump the responsibility for figuring that out
| on the developers.
| philbo wrote:
| > Developers are responsible for learning what kind of
| environment their application runs in, but ops is not
| responsible for having some clue about what they're
| running?
|
| I don't think it's as cut-and-dried as your question frames
| it, but I do think there are fundamental differences
| between the two positions that justify some of the tension
| there.
|
| The problem is the difference between domain knowledge and
| general systems knowledge. The former varies wildly from
| org to org, team to team or even within individual teams.
| The latter is more consistent across wider applications and
| over longer timeframes.
|
| Developers usually need a lot of domain knwoledge to do
| their job, which can leave less space for systems stuff.
| But the systems stuff they do learn tends to be more widely
| applicable.
|
| Ops folk often service many teams where the domain
| knowledge differs between them. The best of them might be
| able to internalise all of those differences but it's a big
| ask. And there's rarely any crossover.
|
| This difference is also why developers tend to have a
| slower ramp-up time than ops engineers do on joining a new
| team. It's just the nature of the work.
|
| I say all this as someone from the developer side of the
| fence. I'm fortunate to have some years in the bank now
| that the systems stuff comes more easily. The domain stuff
| remains really hard.
| BurritoAlPastor wrote:
| Developers have more responsibility than ops for knowing
| their apps, for the simple reason that each developer owns
| a small number of apps, but ops owns the infrastructure for
| all the apps.
| kcb wrote:
| I don't follow. Why compare a developer to the entire ops
| organization?
| tadpole172 wrote:
| Because the ops org doesnt concentrate on just the one
| application. They have broad knowledge of the entire
| stack and therefore don't have as deep of an
| understanding on any single piece.
| kcb wrote:
| The dev org also doesn't concentrate on just one
| application. I've not seen this situation where every Ops
| personnel is assigned to the entire stack. Each Ops
| employee or team in a larger organization is generally
| responsible for a subset of the environments.
| jameshart wrote:
| Why has your organization built a one-size-fits-all ops
| organization if it doesn't have a one-size-fits-all dev
| organization? Sounds like a failure of ops organization
| to recognize that the needs of the email hosting guys are
| different from the website team or the billing team.
| Maybe you should build a set of smaller, more focused ops
| teams focused on meeting the needs of those different
| groups?
| kazen44 wrote:
| Smaller, more focused ops teams already exists, but are
| not bound by application boundaries but by system
| boundaries. (mostly, storage, compute and networking).
| The reason is because each of these is a completely
| different environment on its own.
| cogman10 wrote:
| I completely agree. Far too many devs are clueless about
| how their apps perform or interact with the ecosystem.
| That tunnel vision has a LOT of negative consequences on
| infrastructure.
| [deleted]
| protomyth wrote:
| _QA teams sure do buffer a lot of crap. They also cost a bunch
| and slow down time to release._
|
| If your QA team is slowing down releases then that is the
| developer's fault not the QA team. Frankly, this move fast,
| don't do proper QA is irresponsible and a danger to users.
| marcosdumay wrote:
| They add latency, there's no way around it. Even if there are
| no software problems and their verification is instantaneous,
| QA by itself adds an extra hand-off to a team with an
| independent task queue.
| exdsq wrote:
| If your QA team is a "thing" that gets features at the end
| of a sprint and churns out bugs or releases you're doing it
| wrong. They should be involved on a feature by feature
| basis working alongside the developer with QA time
| incorporated into every task. All unit/integration/system
| tests should be automated during the cycle so there is no
| "hand-off to QA". There should be _less_ latency because
| you have a test expert speeding up implementing tests or
| being a force multiplier to developers by acting as an
| internal consultant who can advise on bits where needed.
| icedchai wrote:
| Seriously, I haven't worked at a company with a QA team in
| almost 10 years. Do these actually exist anymore? It would
| certainly be nice to have.
| Aeolun wrote:
| They do! They're really good at their job but _definitely_
| slows down releases.
|
| Then again, the entire point is to release after all the
| bugs are fixed, not to get all the bugs into production as
| quickly as possible :)
| protomyth wrote:
| I guess it depends how you count a release. I think these
| fast moving teams spend more time in production debugging
| than the QA team adds. Shipping it should not be the
| final determination of release time.
|
| I wish more companies valued QA teams, then maybe I
| wouldn't get so many notices of security breaches and
| need to keep checks on my credit.
| mateo411 wrote:
| Security breaches are the responsibility of the InfoSec
| team. The QA team usually won't have the skillset to find
| security issues.
| icedchai wrote:
| Or maybe you still would. Are most QA folks actively
| looking for security issues?
| Aeolun wrote:
| Not really. QA is functional. We have a product security
| team doing pentests on new and updated applications.
| protomyth wrote:
| Some of the bonehead stuff will be caught by QA, but
| there are folks on some QA teams that get security.
| Sadly, developers talk down about QA so much that the
| people we need on QA teams are not going to go there.
| _AzMoo wrote:
| We have a fantastic QA team, and they test everything that
| goes to prod. Definitely slows things down (by about 1/3)
| but our user experience is significantly improved because
| of it. IMO a good QA/test team is critical to delivering an
| excellent user experience.
| jodrellblank wrote:
| Pet hate: they're not operations' logs, they're developer logs.
| Developers write the code to create log messages on the
| principle "more is better". Logs are another example of the
| systemic hoarding problem with people and computers.
|
| They're a ratchet pattern, adding more is easy but once they
| exist it's very difficult to find someone with the authority to
| authorise removing them and the willingness to stick their neck
| out and declare that they aren't required and the willingness
| to spend time on low-importance maintenance. As a consequence
| logs build up until something gives and they become high
| importance urgent failure. The middle bit where they "aren't
| important" but they still waste storage space and networking
| bandwidth and processing power (and money) and when there is
| something to debug they waste people's time because the
| important details are needle-in-haystack among tons of low-
| value filler, all gets ignored.
|
| At the limit, it isn't sustainable to print the complete
| internal state of a system at every clock cycle. It "should" be
| possible to do a lot better troubleshooting_power-to-log_weight
| ratio than "print every state change which feels important at
| the time in whatever semi-English message format is
| convenient", shouldn't it?
| jasonlotito wrote:
| I am sorry but as someone who has been on both sides of this, I
| think this is all wrong. And I thank god both my app developers
| and operations people aren't assholes like you.
|
| Hey, that's a pretty shitty way to start off a comment, don't
| you think? With a personal attack?
|
| 1. Yes. But it's not operations problem if you are whining that
| your PS5 game isn't running on the XBox. There is personal
| responsibility in this, too, and it's not operations job to
| hold your hand and explain how to do your job. If you aren't
| reaching out to operations to make requests, they aren't going
| to know what to do. Your entire comment shows that you think
| they are subservient to you, rather than you actually being an
| honest user. Tell them what you want, and work with them to get
| it.
|
| 2. QA teams do not slow down time to better quality releases.
| They do slow down time to half-baked or buggy releases.
| Regardless, the number of app developers to operations people
| is generally a very bad imbalance. I promise you, the good ones
| are working with the people that reach out to them.
|
| 3. Maybe if you invited the operations people earlier, they'd
| have some ownership in the product. But usually they release it
| without operations even knowing, and suddenly there is
| something in production that is half-working. They had no hand
| in it. They literally did not work on the project, so they
| can't know.
|
| 4. You can't abstract away things if you don't know how they
| work or account for them. Again, inviting operations people to
| earlier discussion is incredibly easy. You know what projects
| you are working on, they tend to not because there are far
| fewer of them than there are application developers. So, it's
| on you to reach out to them to get input. Yes, they have to
| make themselves available, but you have to invite. And guess
| what? When you do that, you get a wealth of information and
| makes the product better.
|
| Your comment feels like someone who is used to expecting
| perfection from others while accepting their own mediocrity.
|
| Wow... ending a comment with an insult is rather shitty, too.
| Why did you decide to go the route of writing a comment that
| starts of shitty and ends up that way?
|
| Personally, I did it to hold up a mirror to you.
| cf499 wrote:
| "Maybe if the infrastructure people were involved in
| development discussion earlier, they'd be able to raise their
| hands and say: wait a minute, you're going to blow up our
| logs."
|
| "Maybe if you invited the operations people earlier, they'd
| have some ownership in the product."
|
| Awww... You like each other but none of you dare to make the
| first move :D
| happymellon wrote:
| Ops/Infra teams don't usually start software projects and
| not aware that there is a project for them to offer their
| help with.
|
| My experience has been that they can be very accomodating
| and supportive if you do talk to tham.
| Aeolun wrote:
| Yeah, all those devs are just sitting there at their
| desks clacking away on their novels.
|
| There is _always_ a project for them to offer help with,
| since the business will not suffer devs to be idle.
| ozim wrote:
| Whole thread reads like bunch of guys shouting at each
| other "but but ... I know better!".
|
| IMO this is main topic of the thread and of the article.
|
| There are groups of people who instead of spending time to
| figure out how to work together and understand what other
| side has to say, they just throw shit over the fence.
|
| Maybe some could start by reading points at least couple of
| times and try to understand instead of trying to write
| personal experiences as fast as they can in reply to other
| comment that hurts their ego.
| izacus wrote:
| > And I thank god both my app developers and operations
| people aren't assholes like you.
|
| Uhh... can we chill with the personal insults a bit?
| d--b wrote:
| Cause the whole article reads like "developers are whiny
| assholes who don't know shit about computers". And yes, it
| starts with an attack and ends with an attack too.
|
| 1. It's not operations problem for sure, but I certainly
| don't bash people for not knowing things I am the expert of.
|
| 2. Fine
|
| 3. The OP's saying he doesn't want to know!
|
| 4. Well, writing applications is sitting atop a stack of
| technologies more and more abstract. A developer not knowing
| what happens in an IP packet is the same as an infrastructure
| guy not knowing what happens in an NP junction.
| civilized wrote:
| > Cause the whole article reads like "developers are whiny
| assholes who don't know shit about computers". And yes, it
| starts with an attack and ends with an attack too.
|
| There is no attack in the text. There is a complaint that
| issues presented to operations often lack the basic level
| of detail and due diligence that they should have. You are
| free to disagree with the author's expected level of due
| diligence on issues; I think you'd be wrong to, but you
| can. However, it isn't an attack.
|
| You perceive a non-attack as an attack, and respond with an
| explicit attack and name-calling. That actually makes _you_
| the aggressor.
|
| Hmm, who is the asshole here?
| burnished wrote:
| It read more like "these developers are asking poorly
| formed, difficult to answer questions", and frankly
| reminded me of a LOT of r/CodingHelp problems I've seen
| lately. Aside from that the author seems to repeatedly have
| empathy and admiration for developers but thinks that there
| is a systematic disfunction. There is definitely a little
| "old man shouts at clouds" too, but at least to me this
| article read as a legitimate discussion of some pain
| points, certainly not a hit piece.
| Aeolun wrote:
| Hmm, it sounds like the opposite to me. I find it really
| hard to read because of the constant 'devs are stupid'
| comments.
|
| There is a legitimate point buried there, but I just kept
| seeing red reading it.
| sgarland wrote:
| > A developer not knowing what happens in an IP packet
|
| I don't care if devs understand IP packets, TCP congestion
| control algorithms, or anything similarly low-level. If
| they do, that's awesome, but it's not expected. I do expect
| them to have a basic understanding of expected latencies
| for intra-DC vs. internet, why running Flask in production
| isn't a good idea, and if they're really sharp, an inkling
| of how Kubernetes networking works.
| arwineap wrote:
| I think I understand your sentiment, but what's wrong
| with flask??
| kazen44 wrote:
| i assume the poster means running flask in production
| without something like nginx in front to serve as the
| webserver.
|
| the flask build in webserver is not production grade
| software in my opinion.
| Cyphus wrote:
| It is also the opinion of the people who wrote the built-
| in webserver. If you try to run it in production mode,
| it'll emits this warning on startup:
|
| > WARNING: This is a development server. Do not use it in
| a production deployment. > Use a production WSGI server
| instead.
|
| I don't expect junior devs to have a sense for what is
| production-grade and what is not, but if they try to ship
| software that explicitly warns against being used in
| production, you've got a real liability on your hands.
| mdekkers wrote:
| > wait a minute, you're going to blow up our logs.
| You really need to have that pointed out to you?
| igetspam wrote:
| I believe your assessment of the agreement is flawed.
| Application developers are not our users. You're our tenants.
| We provide highly available housing for your projects. We keep
| the lights on, we keep walls standing and we make sure the roof
| doesn't leak. We also provide APIs for you to interact with.
| When those things fail, we are responsible. When your code
| doesn't run in the test environment where everyone else's does,
| that's not our job. I'll help you but at my convenience because
| I have other things to do. Of your app fails in the middle of
| the night, that's your responsibility. If it's an infra
| problem, then it's on me. We don't ask you to tune the network
| or balance the cluster or ask you why the daemon sets are
| failing, right? If this was a shared responsibility, you'd be
| helping with the core too but I can almost guarantee that's not
| happening. (Some of my eng peers do but the vast majority think
| or it as a black box.)
| draw_down wrote:
| Jeez. I think it's really despicable to read the behavior this
| person is describing and decide they're the asshole.
|
| It isn't surprising though, this is par for the course in tech
| workplaces it seems. The problem isn't that I shit all over
| your doorstep, the problem is you pointing it out instead of
| just cleaning it up silently.
| tucosan wrote:
| Wow. Starting your argument with an ad hominem attack qualifies
| you as one of those people I will never ever want to work with.
| waylandsmithers wrote:
| On point 3: > I likely don't even know what "working" means in
| the context of your app.
|
| I think both sides can do more to reach into the domain of the
| other. I get it- we don't want to deal with blinking lights and
| they don't want to deal a missing semicolon breaking
| everything.
|
| Honestly I think "that's not my problem" is one of the worst
| attitudes you can have as part of an organization with common
| goals.
| time0ut wrote:
| It sounds like someone who is frustrated because the process or
| culture in their organization has lead to point 3. I tend to
| involve ops before I write a single line of code and definitely
| before deploying to a stage environment. Over the course of a
| project, they help me write the runbook, create dashboards, and
| alerts. After all, we are all on the hook when things go
| sideways at 3AM. I want them to know as much as possible about
| how things work.
| generalk wrote:
| This is the way.
|
| My previous company had a HUGE problem with Devs cowboying
| off and doing whatever and dumping it on the Ops team at the
| last minute.
|
| One of the biggest (but for damn sure not the last) issues
| was a dev who designed and built an entire new product around
| a MongoDB database, which wasn't something we had in
| production, and something he didn't mention during the months
| of development and demos to stakeholders. Week before the
| launch date he hits up our Ops folks to get production set
| up.
|
| Ops was calm and collected about the whole thing. "We don't
| have MongoDB in production. Are you volunteering to learn how
| to correctly install it, write monitors for alerting, be
| paged with issues, figure out backups and how to ensure our
| data stays safe, secure, and available? You're not? Then get
| the [redacted] out and rewrite your app. Yes it will affect
| the ship date, and yes it's your fault."
|
| I'd love to say we used that opportunity to shore up our
| processes involving kicking off new applications and
| including Ops folks in from day one, but that took years
| more.
| time0ut wrote:
| Something similar happened at my company like 5 years ago.
|
| A developer was tasked with adding a major new feature to
| one of our older monoliths. He added MongoDB as a
| dependency. The application already had a well managed
| Oracle database. Nothing about the feature required
| MongoDB.
|
| When it came time to go to production, the DBA and ops
| teams responded similarly to how you did. I wish I could
| say sanity prevailed, but the business mumbled something
| about contractually obligated release dates and forced it
| through to production. Pretty sure it is still there
| rotting away.
|
| I've worked mostly on the app side of things and this sort
| of thing just makes me shake my head.
| random_kris wrote:
| well at the end of the day you managed to ship it? Did it
| cause any big problems down the line? It seems the
| biggest problem is that it is rotting away somewhere,
| which to me means that it is working without need to do
| much care on it.
|
| If they listened to your DBA/ops guys no value would be
| gettig shipped ;)
| time0ut wrote:
| I don't know of any big problems other than the
| unnecessary cost. I agree meeting the needs of the
| company is king, but it was just a lot of unnecessary
| complexity because a dev wanted to put MongoDB on their
| resume. Could have been avoided by talking to the rest of
| the team early on. Of course, they would not have liked
| the answer of just creating a new table in boring old
| Oracle.
| Aeolun wrote:
| To be fair, when forced to choose between Oracle and
| MongoDB I'd also have a serious dilemma.
| oblio wrote:
| > Ops was calm and collected about the whole thing. "We
| don't have MongoDB in production. Are you volunteering to
| learn how to correctly install it, write monitors for
| alerting, be paged with issues, figure out backups and how
| to ensure our data stays safe, secure, and available?
| You're not? Then get the [redacted] out and rewrite your
| app. Yes it will affect the ship date, and yes it's your
| fault."
|
| Love the shoot-down!
| davidgerard wrote:
| Ops here. The threat of 3am phone calls does wonders, in my
| experience.
|
| If it turns out it was product owner pressure, the product
| owner gets a call too. Possibly first.
| Aeolun wrote:
| So, you could have delayed the app by the same amount but
| now have a mongo environment for production as well?
|
| Seems a bit of a waste to rewrite the app instead.
|
| Not that I would recommend Mongo anywhere, production or
| dev, but it would apply for any other technology for which
| this happened.
| generalk wrote:
| > So, you could have delayed the app by the same amount
| > but now have a mongo environment for production as
| well?
|
| No, we couldn't have. Not just because we didn't want
| MongoDB, which at the time was notorious for data loss,
| but because our ops team didn't have the capacity at that
| point in their schedule or team size to handle it. Maybe
| had we discussed at the beginning of the project plans
| could have been made or altered, but we didn't and so
| they couldn't. > Seems a bit of a waste
| to rewrite the app instead.
|
| The responsible dev took the time necessary to rewrite
| the data layer to better reflect the needs of the
| application.
|
| Is what I wish had happened. Instead the developer jammed
| the huge JSON blobs into a column on an MSSQL table and
| changed a few lines. lolsob.
| jimbokun wrote:
| > Instead the developer jammed the huge JSON blobs into a
| column on an MSSQL table and changed a few lines.
|
| Sounds like quickest way to deliver value to the
| customer. As described, was far too late in the process
| to worry about deploying with a clean, extensible
| architecture.
|
| A reasonable amount of technical debt in order to ship in
| the timeframe available.
| kazen44 wrote:
| except that shipping something with semi-broken
| infrastructure leads to losses down the line.
|
| What if your mongodb database drops its data and now you
| have production impact? Are those losses calculated while
| making these decisions during development.
| Aeolun wrote:
| > because our ops team didn't have the capacity at that
| point in their schedule or team size to handle it
|
| Lol, I get your point, but that was also true for the dev
| organisation. Hence what you ended up with.
|
| I doubt the needs of the application included a rewrite
| in MSSQL.
| czep wrote:
| > Application developers are your users.
|
| No. Equating internal teams with paying customers is the very
| attitude that is causing these problems. Encouraging teams to
| think about their "internal customers" leads those customers to
| become entitled. We work together in the same company, our
| relationship is not the same as with actual external paying
| customers. I can't tell a paying customer that they're being
| unreasonable or lazy or unrealistic. We absolutely should be
| able to have that conversation with other internal teams when
| appropriate.
|
| The post is describing the situation that has evolved as a
| result of QA being phased out. Telling Ops to suck up that
| extra work because "Dev are your users" is exactly why the post
| was written.
| gravypod wrote:
| At most large companies things are organized in such a way
| that internal teams are your "paying users". Some internal
| teams at some companies even say "If you want X feature and Y
| support you need to request $$$ funding and N people for our
| team".
| LambdaComplex wrote:
| ...Isn't that how Sears went bankrupt?
| gravypod wrote:
| I don't know much about Sears. I've mostly worked as a
| Software Engineer and know other Software Engineers.
| 0n34n7 wrote:
| Agreed. Good application code often contains edge case
| handling, build time checks, unit tests and defensive flows
| that handle the unexpected so that users don't wake you up at
| night. Why can Ops not do the same? Why can Dockerfiles /
| Orchestrators / CI / playbooks not also implement sanity checks
| on deployments?
|
| "Ooops... deployment failed. While deploying your artifact we
| found the following:
|
| - Nothing is listening on the nominated port
|
| - Your deployment is utilizing 100% CPU while idling
|
| - We detected an abnormal volume of write operations to the
| mount
|
| Please fix these issues and re-trigger the pipeline at your
| earliest convenience.
|
| Regards, Ops."
| jensensbutton wrote:
| > Why can Ops not do the same? Why can Dockerfiles /
| Orchestrators / CI / playbooks not also implement sanity
| checks on deployments?
|
| All of those things were written by developers.
| clipradiowallet wrote:
| > - Nothing is listening on the nominated port
|
| Now that just shouldn't happen... ie, we(ops) aren't going to
| deploy something that doesn't come with healthcheck(s). The
| healthcheck never passing(port isn't listening) is going to
| stop the deployment from ever completing. Ops job is to push
| back on developers if they try to hand us something like this
| to build a pipeline for. In my company, to hand Ops the name
| of a repo and say "build a pipeline"...there are a lot of
| requirements, and the biggest one is a list of SLAs. That
| list of SLAs is how we build monitoring for your application,
| and one of those should _always_ be a list of port(s) and
| protocol(s) that are exposed; we build monitors against
| those.
| seniorThrowaway wrote:
| "Oh those are normal errors" - Every developer I've ever
| worked with
| greedo wrote:
| I think if your take away is that the author is an asshole, you
| might want to reflect on specifically why you feel that way. In
| my experience as a sysadmin, in a large company that's been
| trying to become a user of "cool" IT in the last decade, the
| article is spot on.
|
| I think for point 1, he's trying to say that application
| developers aren't doing their role as both dev and QA. I've
| witnessed the same issue where an DBA had trouble installing
| Maxscale on two identical servers. He was convinced that there
| must be something different between the two servers despite
| them being created from the same template, and only differing
| in IP/hostname. He had done no research, opened no tickets with
| the vendor, but instead wasted 30 minutes of my time arguing
| that it's not his fault. And this is common with many of the
| developers I've worked with in the last decade.
|
| For #3, I don't own the application you develop. We provide you
| with a platform that YOUR application runs on, based on
| requirements you provide. If you don't do an adequate job of
| providing accurate requirements, that's on you, no my team.
|
| And #4, developers don't abstract all those things away, they
| often fundamentally don't understand how they work at all, so
| they ignore them. This ignorance has damning consequences when
| they make blind assumptions about how things work.
| greedo wrote:
| I used "mine/yours" to denote where the responsibility lies.
| In a small org you can have the entire IT team troubleshoot
| an issue. In a large org, that's unfeasible.
|
| I'm willing to help troubleshoot and provide guidance based
| on my experience, assuming the application developer has
| performed their due diligence. I have no insight into what
| their application is expected to do, or its failure modes. I
| have no input into the coding methods, the test harnesses,
| the deployment process. But when that shit breaks because the
| dev doesn't understand the difference between `rm -rf ./*`
| and `rm -rf /` that's his problem.
|
| Now of course this is an org problem, not a team problem. As
| in parenting, setting boundaries and responsibilities is the
| key to success. Too many leaders in IT simply think that
| "DevOps" will be cheaper and faster and leave it at that.
| EastSmith wrote:
| Talking in terms of mine and yours means we are not on the
| same side. And this is the problem.
|
| If there is a problem with the deploy let's meet, fix the
| issue and most importantly learn from the problem, and
| document the incident for future reference.
|
| And them move on without fingerpointing.
| CodeMage wrote:
| Just because I have my responsibilities and you have yours,
| it doesn't mean we're not on the same side.
|
| I've come to dread cute management phrases like "everyone
| should pull on the rope". I agree with the sentiment, but
| software development is not as simple as pulling on a rope.
| There are lots of moving parts and lots of things to
| specialize in. And I say this as a generalist dev, not as
| an ops engineer.
|
| I agree with TFA completely. I was interviewing for a job
| recently, and one of the questions I would ask when the
| interviewer signaled it was time for me to ask questions
| was "how do you handle QA?" On some occasions, this got me
| weird looks, because "QA" seems to be an antiquated
| concept.
|
| In a similar vein, my stint at Amazon taught me that one of
| the questions to ask my interviewers is to tell me about
| their on-call rotation. Is there any? How often are you on
| call and for how long? Who gets paged first?
|
| Yeah, we're all on the same side, but there needs to be
| some structure and order. Otherwise, you end up with
| something like this:
|
| _" Twenty-seven people were got out of bed in quick
| succession and they got another fifty-three out of bed,
| because if there is one thing a man wants to know when he's
| woken up in a panic at 4:00 A.M., it's that he's not
| alone."_
|
| -- from "Good Omens", by Sir Terry Pratchett and Neil
| Gaiman
| emmelaich wrote:
| I totally agree with TFA; except it was ever thus. (And agile
| has helped reduce the problem if anything)
|
| As an ops person I've had to explain the devs own architecture
| to them; they didn't know how it sent mail -- nothing to do
| with SMTP; they just hadn't shared the knowledge among
| themselves of the db/java app interaction.
|
| I once had a developer tell me ridiculous things like "my java
| app can't write to java.tmpdir". They couldn't even tell me
| what file they were trying to write. I had to dive into apache
| docs and send it to them. I turned out to be a bug in an apache
| project code, nothing to do with tmpdir writeability.
|
| The lack of basic responsibility and ownership was appalling.
| generalk wrote:
| I find this response surprising, as I fully agreed with TFA.
|
| I've had an Ops team that had a similar attitude, and they did
| a _lot_ to help me become a good developer. Part of that was
| requiring that I come to them with identified problems. "Hey
| I'm getting this error, can you take a look at a stack trace in
| a language you've never used and tell me what's wrong?" would
| have gotten me booed/laughed out of the office, and for good
| reason.
|
| It's not at all unreasonable to expect the developer to come
| around instead with "hey my application can't write to this NFS
| mount like I expected. It's running as $user, the permissions
| look right but I'm still getting permission denied. Any
| thoughts?" (A real situation I ran into, turns out SELinux had
| further permissions I was unaware of, and my Ops lead Chip was
| happy to show me what was what.)
|
| Yeah, we're all on the same team, and that cuts both ways --
| Ops should ensure Dev has what it needs, and Dev should make
| some actual effort to understand the landscape their production
| applications run in. Which seemed to me to be the entire point
| of TFA.
| emeraldd wrote:
| > Part of that was requiring that I come to them with
| identified problems. "Hey I'm getting this error, can you
| take a look at a stack trace in a language you've never used
| and tell me what's wrong?" would have gotten me booed/laughed
| out of the office, and for good reason.
|
| This a thousand times over ... If you can train your users to
| do this any customer relationship will be better off!
| xorcist wrote:
| I've always tried to encourage the following format for all
| professional questions of that sort:
|
| "a) I do exactly this, b) expected this outcome, c) but got
| this instead"
|
| Short and to the point, it's remarkable how much easier it
| makes things for everyone. I think I got it off usenet at
| some time.
| igetspam wrote:
| You, sir or madame, are a good job. I like working with
| people like you. I want to help but some things just don't
| fall into my wheelhouse buy when they do, we're on it. This
| is how teamwork should be defined.
| indigodaddy wrote:
| Thank you for being one of the minority that do this!
| civilized wrote:
| If the author is an asshole, you certainly also are one by the
| same standard.
|
| Developers are not just "users", they're fellow software
| professionals who can reasonably be expected to work harder on
| troubleshooting than reporting "it works on my machine but not
| in the test environment :(" without even reading the error
| message or including it in the report.
|
| As a general rule, when you have most of the control or
| knowledge of a technical process and you want someone else to
| help you with it, you need to give that other person as much
| transparency and info as possible. Because they don't control
| the process and will have to slowly, laboriously ask you
| questions, or ask you to do things, rather than just probing
| the system themselves.
|
| They're taking time out of their day to work in a relatively
| inefficient and frustrating mode just to help you out, so jeez,
| have some respect and try to make their jobs a little easier.
|
| If you don't and prefer to wear this entitled attitude, fine,
| but you're just as much an asshole as he is.
| seniorThrowaway wrote:
| my favorite response ever to "it works fine on my laptop/dev
| machine" is "let's connect the prod load balancer to your
| workstation and get you a pager, problem solved!"
| NotSammyHagar wrote:
| At many companies there is no one to help developers.
| bob1029 wrote:
| What I am seeing is a need for more vertical integration. Teams
| need to be made to own the entire product stack. If you do this,
| they will be incentivized to make it simple and stable.
|
| No one should ever get to play "not my job" while simultaneously
| throwing complexity grenades over to another team.
| dragonwriter wrote:
| This is why teams should be cross-functional and product-
| organized, divided, if further is necessary, by product
| _component_ not function, instead of function-organized.
|
| Function-organized teams encourage knowledge siloes, and its-
| some-othrler-teams-problem-ism.
| mpitt wrote:
| That sounds great in theory, but what happens if your dev/ops
| ratio is something like 15/1? How do you put an ops person in
| every team? I think it's the right answer but it seems
| impossible to put in practice.
| lbhdc wrote:
| An alternative is that all or some of the devs share the load
| of ops work.
| manuelabeledo wrote:
| This reads like a guy trying to take complete ownership, while
| renouncing to any accountability.
|
| I have been in this industry for 15+ years, and as a developer, I
| have a surprising amount of experience dealing with customers. Of
| course, when a customer complains about some feature not working,
| I would not just take their word for it. Customers mess up too.
|
| What I _would not do_ is brush their complains off. "This is a
| systemic issue". "They are causing problems". "They don't know
| better". "They don't have the correct incentives". Try telling
| that to a customer, or to your boss.
|
| The obvious disconnect from his own team _is_ the problem.
| jen20 wrote:
| This entire article is written with such profound
| misunderstanding of DevOps - perhaps one induced by vendor
| marketing - that it's effectively meaningless.
|
| Yes, developers should understand the operational environment a
| system runs in, and should be capable of advanced
| troubleshooting. But the rest of the post is simply tired screed
| about how "the old days" were better, despite the fact that they
| manifestly were not.
| 123pie123 wrote:
| >Application teams attempting to assign ownership of a bug to a
| networking team because they didn't account for timeouts.
|
| I had to chuckle - everyone (not just developers) seems to blame
| the network first! (including blame the firewall rules)
| aNoob7000 wrote:
| First blame the network then the database but never the code.
| :)
| tssva wrote:
| This just means the network gets blamed twice because
| inevitably the DBA's will also blame the network once the
| issue gets to them.
| bennyp101 wrote:
| I mean, it /is/ always DNS :)
| tyingq wrote:
| I see a fair amount of DNS problems that trace back to "app
| resolves a DNS name at startup, and never does the lookup
| again".
| gfiorav wrote:
| Or if you work in a big company, it's always the proxy
| johngalt wrote:
| At one point running wireshark and reviewing network traces
| with developers was a full time job. Guess what percentage of
| time it was actually a network problem?
| Foobar8568 wrote:
| I had to escalate to basically a CIO of a fortune 500 company
| for someone to take a look at the network performance of our
| system, all teams were blaming applications despite the
| evidences. It ended up to be a bug in a VMWare driver that was
| impacting their whole infrastructure.
| tyingq wrote:
| I would add reasonable retry logic also. I've seen quite a lot
| of outages that would not have been noticed if there were
| decent retry logic with backoff, etc.
| reacharavindh wrote:
| Oh boy, it feels like someone is ringing bells in my head because
| of aligned thoughts.
|
| Let me share an experience. In 2010, I worked on a project for a
| large business in the US(Fortune 100). The process was set so
| rigidly that it worked well, but I was among the group of people
| who were mad at it saying"why is this so rigid? Trust us and let
| us do things faster!!". Context : There were change management
| rules in place. The software was to be released only on a regular
| cadence of about 6 months, only after thorough integration tests,
| and approval from the change mgmt board. Should anything go wrong
| in "move to prod" there will be representation from dev, QA, Ops,
| change mgmt, and Mgmt orgs to immediately decide on actions until
| the release to prod is successful. There will be thorough
| documentation of what to do (run books) on what changes occurred,
| what their impact could be and how to rollback if something
| unexpected occurs. It was always a party after a successful
| release :-)
|
| Trust me there were a lot of bugs, but they were mostly found and
| fixed during the laborious QA and integration tests by people
| whose job it was.
|
| Fast forward to now, I am a "Cloud Engineer" in a small team that
| does everything from app development to building CI pipelines to
| running services on AWS to being on-call to keep them running.
|
| I must say, I wish for the old days back. Sure, it was slow and
| laborious, but it resulted in better outcomes and manageability.
| IMHO, it also resulted in better reliability of software due to
| the diligence done by several layers.
|
| It is easy to say do the same just faster in your small team.
| But, in practicality it just doesn't happen. I work on setting up
| Observability one week, then onto designing infra for a new
| service, then onto some development and so on. I feel like my
| scope would have been limited, and I would have had an easier
| time becoming an expert at something than becoming so broad
| skilled like I am today.
|
| Sometimes, old, slow, and mature is not so bad. Not everyone
| needs to follow the FAANG SV companies to be successful.
| ByteWelder wrote:
| > I must say, I wish for the old days back. Sure, it was slow
| and laborious, but it resulted in better outcomes and
| manageability. IMHO, it also resulted in better reliability of
| software due to the diligence done by several layers.
|
| Those were also the days where it took many years to go from
| Java 6 to Java 8. Or perhaps to try out Kotlin.
|
| They were the days where legacy code was the norm, and we kept
| supporting it because nobody dared to change anything for the
| better. In practice, that's just not something you can maintain
| in a competitive market, because your competitors _will_ use
| new technologies and faster/better development processes.
|
| "it just works" might be good enough for maintaining your
| application, but will it be good enough to find people willing
| to work in that code base or that environment?
|
| I work for a large business where both the old and new
| practices are in place (mostly the new ones, though). Focusing
| on "going fast" is definitely not a good idea, but I believe
| there's a sweet spot in between.
| datavirtue wrote:
| All code is legacy code. As soon as you start changing the
| existing code...it is legacy.
|
| I'm on a project now that has not released to prod. It has a
| lot of new legacy code.
| reacharavindh wrote:
| Sure, mature processes encourage tech stagnation, and
| discourage even beneficial changes as collateral damage. But,
| as you say there is line somewhere at which project should
| move on from "Go fast, ship often, change much and get
| feature-rich" to "focus on correctness, stable releases,
| actually maintain our existing features". Perhaps it is
| really a cycle of both and missing one for the other leads to
| problems.
| wayoutthere wrote:
| Here here. If the bulk of your "products" are for internal
| consumers, you likely aren't paying enough to attract talent
| who know how to operate in the the FAANG model.
|
| I like to distinguish between "product developers" (i.e.
| building products for consumers with guaranteed scale, so do it
| right the first time) and "project developers" (get it done
| ASAP and cut the corners you need to do so).
|
| In the "project developer" world, 50-75% of your requirements
| gathering happens before a line of code is written. There is
| usually a "right way" to implement a process of which
| technology is only one component and figuring that out as you
| go will actually slow down the project due to the maker /
| manager schedule conflict. True "agile" in this environment
| just leads to scope creep as there usually aren't dedicated
| product owners to say no to every little request.
|
| I've stopped pushing agile as hard because the corporates
| simply can't afford the kind of engineers to make it work
| correctly, and they don't have the roles required to gather and
| feed requirements to a dev team in an agile format. Sprints are
| a good way to time-box feature development, but most business
| projects work better with a more waterfall approach. Your
| customers and project plan operate under waterfall so there's
| less downside to begin with.
| reacharavindh wrote:
| Great comment about "Project developers" and "Product
| developers". It is almost an entirely different art to get
| the requirements right by iterating on a project, and
| bringing out a solution to life versus engineering a
| scalable, maintainable product that evolves after a good
| start. I never had to think of such distinction.
|
| Waterfall model has its downsides in extracting the
| requirements out properly whereas the Agile approach(the
| little I have seen of it) seems to lose the layered stability
| of a waterfall based approach.
| bdavis__ wrote:
| excellent comments. instead of straight waterfall, i would
| suggest a time boxed requirements phase, followed by
| incremental development with a reasonable cadence (dictated
| by the product; web might be 2 weeks, more serious domains
| might be 90 days). you need iteration, but having a solid
| grounding on what you are going to build eliminates churn.
| elfrinjo wrote:
| I sent a link to that post to a senior developer with similar
| habits. He answered: "didn't read, don't understand english."
| rmetzler wrote:
| Just today I saw another "works on my machine" issue. The dev
| didn't complain for 3 weeks that his latest code isn't deployed.
| QA found out today (on a Friday) about it and the dev has his day
| off. The issues were not hard to fix, but it's not the DevOps
| job.
|
| Especially when the dev wanted to migrate from Java 8 to Java 11
| and didn't even attempt to lookup our documentation on how to
| change JVM parameters.
| giantg2 wrote:
| "Operations is not Developer IT"
|
| It seems more and more places want it to be. DevOps is all the
| rage.
| ineedasername wrote:
| I'm not trying to be glib, it honestly sound like a lot of the
| people this person worked with needed a strong lesson in LMGTFY.
| dgb23 wrote:
| > Nobody gets promoted for maintenance or passing a security
| audit.
|
| This is a huge problem. Working on reliability and security is
| hard, shipping broken features is easy.
| nijave wrote:
| Not only that, fixing those issues generally adds work that
| sucks up time that could be devoted to shipping new features.
|
| In that regard, those roles are slowing things down and costing
| money
| lucasyvas wrote:
| The problem often lies with the entire Organization and not the
| Development team. I've had roles where Development was empowered
| to code the product and deploy the code, which necessitates
| certain access. At that point, troubleshooting is trivial and we
| can solve our own problems. It's amazing.
|
| Throw in some red tape where I can't have access to logs myself?
| Then I don't care to fix it at all - chasing another team, that
| has diverging priorities, is complete a waste of my time.
|
| If your Developers are tossing shit over a wall, I'd bet top
| dollar you work in organization B. In which case they are
| behaving accordingly. Don't empower me to identify and fix
| issues? Then I won't (and I won't lose sleep over it either).
| plebianRube wrote:
| >developers are not incentivized or even encouraged to gain
| broader knowledge of how their systems work
|
| This is the crux of the problem. Coding in isolation. Replies of
| 'It's java, it should work anywhere' etc.
|
| The other gear grinding commom theme is not even doing basic
| troubleshooting. To the point of not even googling the error
| message or the symptoms, and being 'blocked' because they are
| waiting on a ticket they opened with the 'other' team.
| reportgunner wrote:
| _Works on my machine_
| bennyp101 wrote:
| We're a small company so we sometimes do many things, but it's
| taught me a lot of networking fault finding.
|
| There's some very clever ppl that know all about how
| networks/vm stuff work, and I've learnt enough from them that I
| can fix most of my own infra related things - or at least give
| them a run down of what I've done first to save them some time.
|
| It got me back into hardware and networky stuff, so now I've
| got a MikroTik at home, some proxmox machines, Tailscale
| network etc - more fun than just spin up a box on DO and be
| done with it.
|
| A lot of ppl just aren't interested though, they just want to
| code (and maybe learn a new language) but because a lot of
| stuff is now PaaS and it's super easy, there is no need to
| learn it (in their eyes)
| debarshri wrote:
| I think incentive for developer is to be relevant. If you don't
| do it, someone else will. And that becomes the new norm. Like
| how DevOps has become the new norm.
| arminiusreturns wrote:
| I've dealt with most of the issues in TFA and in comments here.
| Without engaging too much in the technicalities, I would offer
| that most of these issues actually stem from leadership, or lack
| thereof, and most often, at the middle management level, but
| sometimes middle management issues are just covering up upper
| management issues. Generally, I see these kinds of issues more
| often in non-technical management presiding over technical teams,
| because they have learned all the correct propitations to upper
| management and all the buzzword bingo for their teams, but lack a
| real understanding, and more importantly, lack the ability to
| form a coherent and actionable _vision_ to correct these kinds of
| issues. (pet peeve issue with middle management is when they push
| others out of the interview process, and suddenly you have hires
| that don 't belong _at all_.)
|
| As for me, I'm currently watching a good devops team go down the
| drain because of a bad manager, so I'm seriously considering
| trying to move to management so I can help my employer do better.
___________________________________________________________________
(page generated 2021-09-03 23:02 UTC)