[HN Gopher] Silos are fine, as long as there is an API between them
       ___________________________________________________________________
        
       Silos are fine, as long as there is an API between them
        
       Author : platformeng
       Score  : 120 points
       Date   : 2024-01-19 13:32 UTC (9 hours ago)
        
 (HTM) web link (fernandovillalba.substack.com)
 (TXT) w3m dump (fernandovillalba.substack.com)
        
       | internetter wrote:
       | I thought this was going to be about siloed products. Like AWS is
       | a silo, because once you use AWS you are effectively locked in,
       | whereas WinterCG and Unix standards make implementors non-siloed.
       | 
       | Oh well, this take would be a lot more spicy if so
        
         | bazil376 wrote:
         | I prefer my takes lukewarm
        
       | alejoar wrote:
       | New management in the last company I worked for went on a crusade
       | against silos.
       | 
       | Their strategy was to randomly shuffle people across teams.
       | Literally random.
       | 
       | You could tell how productivity bottomed and soon after a lot of
       | senior engineers left the company.
       | 
       | This is a multimillion american media company.
        
         | cdavid wrote:
         | silos is one of this keyword that is org health dependent.
         | 
         | When things go well, people highlight team autonomy. For the
         | exact same setup, when things go bad, people talk about silos.
        
         | bigbillheck wrote:
         | Mandating that is a terrible idea but I think giving senior
         | people (and go-getting junior ones) in team A an opportunity to
         | do a 6-12 month rotation in team B, and so on, is a great idea.
        
       | braza wrote:
       | I can resonate with the author, giving the context working on
       | Scaleups and Big companies, where the coordination and
       | communication plays a very heavy toil on the actual work.
       | 
       | I do not believe in this model of "big collaboration" with folks
       | swarming around a problem and each one with partial context
       | trying to fix something that structurally has a huge
       | communication, coordination and technical coupling.
       | 
       | The best work experiences that I had was in teams that we
       | establish the APIs, S3 buckets where the processed batch files
       | will stay an maybe a e-mail list where someone will reply if
       | something change.
        
         | blastro wrote:
         | First two sentences describe my current situation. How to
         | improve it?
        
           | braza wrote:
           | I do not have a straight answer, but one thing that I saw
           | working is the concept of "Good fences" where folks still act
           | as a team but has clear touch points related with problems
           | and when things goes south everyone knows what to do.
        
             | blastro wrote:
             | Thanks for your input!
        
           | platformeng wrote:
           | Not saying these will solve your problems, but it may help
           | give it a little perspective at least:
           | 
           | https://www.opslevel.com/resources/optimizing-engineering-
           | co...
           | 
           | https://fernandovillalba.substack.com/p/improving-
           | engineerin...
        
             | blastro wrote:
             | Thank you for this
        
       | candiddevmike wrote:
       | This is one of those utopian engineer fallacies that is on the
       | same level as reliable networking and fsync. People don't work
       | like computers and don't respond like API endpoints. There will
       | always be backchannels, backburners, pidgeonholes, and all the
       | other political games humans play happening even with an "API
       | between them".
       | 
       | The problem is always management, and the solution is never "more
       | management", IMO.
        
         | dacryn wrote:
         | an API does not mean there are no hidden dependencies, and that
         | is often the biggest failure.
         | 
         | If an API call is not stateless, or requires to be chained with
         | other calls, you're in for a world of pain in the long run
        
           | nostrebored wrote:
           | But this is equivalent to saying "a monolith can never work
           | because it's highly coupled". In both cases you need to
           | follow best practices to make things work. API design and
           | alignment with consumers of the API is table stakes.
        
         | DanielHB wrote:
         | The whole point of the article is that silos are not
         | intrinsically bad, partially because silos reduces
         | communication (and therefor managers) required
         | 
         | Two teams agreeing on an API between themselves instead of one
         | mega team fulfilling the needs of several client teams
        
           | throw04323 wrote:
           | > Two teams agreeing on an API between themselves
           | 
           | I think that depends on the type of service that the team
           | provides. If you have a central team that many other teams
           | interact with, they risk becoming a bottleneck. They may not
           | be interested in maintaining custom APIs for each team
           | interaction and you will need to agree on a contract that all
           | can live with.
           | 
           | Another risk is that the team providing the service also have
           | their own backlog, including work they want to do themselves
           | and requests from other teams. This can cause unwanted
           | dependencies and delays where managers try to fight to be
           | prioritized on the expense of others.
        
             | packetlost wrote:
             | All I'm saying is it appears to have worked at mega-scale
             | for Amazon.
        
               | acjohnson55 wrote:
               | Most of us don't have mega-scale problems, though. A
               | tremendous amount of waste has been created by applying
               | FAANG tech and processes to completely different
               | contexts.
        
               | packetlost wrote:
               | Sure, but scaling of an organization is _not_ the same as
               | scaling for traffic in a technical sense. There are so
               | many companies that employ comparable numbers of
               | engineers that are not big tech companies.
        
               | throw04323 wrote:
               | I'm not saying it can't work, but that there are risks
               | involved.
               | 
               | I have worked for several companies ranging from local
               | startups to global enterprise (not FAANG). Each company
               | tried the silo approach when they migrated to micro
               | services and it caused significant delays and
               | dependencies. They would have been better off if they
               | focused more on larger domain services with fewer
               | external dependencies.
               | 
               | I am open to the idea that Amazon has been able to avoid
               | these problems, but it's clearly not a silver bullet.
               | 
               | In general I have to say I'm sceptical about comparisons
               | with FAANG, because they live in a completely separate
               | part of the technology sector. They have income similar
               | to small countries and can live with inefficiencies that
               | can break a startup.
        
               | randomdata wrote:
               | _> Each company tried the silo approach when they
               | migrated to micro services_
               | 
               | Doesn't that go without saying? That's literally what
               | micro services is: The siloing of services, just as
               | service is provided in the macro economy, but within the
               | micro economy of a single organization. Without silos,
               | your service is monolithic.
        
               | DanielHB wrote:
               | The problem I see are big companies with several products
               | trying to break down silos between the products to share
               | some infrastructure (be it code, libs, actual cloud
               | infra, support teams, design systems etc) when there is
               | very little overlap between the different products.
               | 
               | All in some grand hope of reducing costs by sharing
               | things. It almost always ends with overly generic
               | solutions that are harder to use, takes more people to
               | support, can't be fitted well in most cases and that
               | everyone involved hates (causing employee attrition).
               | 
               | This is different from having cohesive architecture
               | within a single product.
        
               | marcosdumay wrote:
               | Yes, it scales up very well.
               | 
               | People upthread are trying to say it scales down badly.
        
               | pixl97 wrote:
               | > appears to have worked
               | 
               | Mostly because someone at a higher level said "Your APIs
               | are not a silo, and if you act like they are you will be
               | terminated".
               | 
               | The communication cost will always be there, the question
               | is one of how is it implemented. In the case of an API it
               | tends to reduce the communication costs when someone is
               | forcing all teams at gunpoint to write clear, concise,
               | and well documented APIs and don't allow them to change
               | said APIs without clear, concise, and well documented
               | rules.
               | 
               | I've worked with teams that communicated via API and
               | started randomly changing shit without proper
               | documentation, and without management being held to the
               | fire over their actions, it's just a new type of silo.
        
           | dkarl wrote:
           | Agreeing means the providing team understanding and meeting
           | the needs of the consuming team. Teams that can work together
           | to accomplish this wouldn't be called "silos."
           | 
           | However, when two teams work together to create an API, or a
           | process or some other self-service mechanism, sometimes the
           | API so good that the teams no longer need to talk to each
           | other. The practices and relationships that enabled
           | communication fade away. Walls go up, silos form, but nobody
           | notices anything wrong, because it seems like efficiency is
           | getting better and better. Over time, though, people start to
           | notice that projects that encounter a need to change the API
           | always fail. The API has become legacy, baggage, a problem.
           | 
           | There may still be somebody on the providing team who
           | remembers that they used to get together in the same room
           | with developers from the consuming team to come up with
           | solutions together, and they'll naively suggest that as a
           | solution, but there are now too many assumptions baked in at
           | the management level for that to be allowed to happen. A
           | change in the working relationship between teams means the
           | managers will fight over what this means for different
           | managers' prestige. Somebody's cheese is going to get moved.
           | Managers gear up to go to war over things like that, so upper
           | management dictates a solution that minimizes inter-manager
           | violence, a solution that carefully circumscribes the kinds
           | and amounts of contact that members of the two teams are
           | allowed to have. Voila: silos with windows, and engineers
           | sitting in their respective windows forlornly waving at each
           | other like lovers separated by their parents.
        
       | nine_zeros wrote:
       | I have seen managers create task funnels for requests coming from
       | other teams. They assume this has fixed all coordination issues.
       | 
       | Later I see engineers having to engage week-after-week out to
       | other teams for coordination. Things are not in place. People
       | don't respond, coordination "APIs" don't work as documented. I
       | have seen individuals blamed for dependency failures that they
       | had no part in.
       | 
       | I rarely see the obvious solution: Why don't managers spend day-
       | in-day-out coordinating work across teams/orgs/corporate
       | barriers?
       | 
       | Instead of doing coordination, managers seem to be spending more
       | time doing HR stuff like vacations, performance calibrations,
       | documenting ways to blame engineers. They create such APIs and
       | think their job is done. Why? How? What a severe deadweight in
       | the company.
        
       | jkoberg wrote:
       | there is a difference between silos and domains. It's great for
       | domains to be separated, and independently well-defined. The
       | silos I have seen cause problems are between organizational
       | functions, like "Product" and engineering or "The Business" and
       | implementation teams.
        
       | acjohnson55 wrote:
       | My overall reaction is that this is a great piece for
       | products/teams that have reached significant scale, once the job
       | to be done is too big and complex for one team to own, end-to-
       | end, or there are truly reusable concerns that can be separated
       | from the core product (e.g. auth, observability).
       | 
       | Because interfacing via API is expensive. Writing APIs for others
       | to use productively isn't easy and change management also adds a
       | lot of overhead. And if we're talking about network APIs, there's
       | a ton of distributed systems complexity to account for.
       | 
       | > The problem with the DevOps movement is that it ended up taking
       | "shifting left" to the extreme. In this sense, development teams
       | weren't so much empowered to deliver software faster; rather,
       | they were over-encumbered with infrastructure tasks that were
       | outside of their expertise.
       | 
       | This. In truth, I think this is a major misinterpretation of
       | DevOps, which is meant to empower devs without loading them down
       | with incidental complexity. But I experienced exactly this
       | misinterpretation at the first place I worked that had embraced
       | DevOps culture.
        
         | braza wrote:
         | > Because interfacing via API is expensive. Writing APIs for
         | others to use productively isn't easy and change management
         | also adds a lot of overhead.
         | 
         | I agree in principle, but there is a lot of "unseen
         | coordination/communication" costs that it's easy taken for
         | granted.
         | 
         | When I was working on telecom doing interfacing with carriers
         | (e.g T-Mobile, Verizon, etc.) on thing that I noticed was how
         | simple was to work with those folks: Ok, this is the standard
         | XML, those are the endpoints, that's the list of error codes,
         | the rate limit is X requests per second, a bunch of files will
         | be on this FTP at 5AM daily basis, and if you face more than
         | 100ms latency from our side just call this number.
         | 
         | Working with "product" companies without silos most of the time
         | it's design by committee, folks that won't keep the service
         | running wanting to have a say in our payload wanting us to
         | change our overly reliable RabbitMQ to use their Kafka.
        
           | acjohnson55 wrote:
           | > When I was working on telecom doing interfacing with
           | carriers (e.g T-Mobile, Verizon, etc.) on thing that I
           | noticed was how simple was to work with those folks: Ok, this
           | is the standard XML, those are the endpoints, that's the list
           | of error codes, the rate limit is X requests per second, a
           | bunch of files will be on this FTP at 5AM daily basis, and if
           | you face more than 100ms latency from our side just call this
           | number.
           | 
           | To me, that probably reflects the maturity of the services
           | the carriers provide. And presumably that there's an explicit
           | customer-producer relationship? These things justify the
           | complexity of maintaining a well curated and operated API.
           | 
           | > Working with "product" companies without silos most of the
           | time it's design by committee, folks that won't keep the
           | service running wanting to have a say in our payload wanting
           | us to change our overly reliable RabbitMQ to use their Kafka.
           | 
           | If I understand what you're saying, you've experienced
           | platform people telling you what tech to use, without having
           | real skin in the game for operating your services? If so,
           | that sounds very irritating. To me, a truly silo-less
           | approach would not have that.
           | 
           | To the extent that there are platform teams with a say in
           | architecture, I think they should develop requirements around
           | the external characteristics of the deliverable (performance,
           | cost, observability, contract with other teams, etc) and
           | largely leave the implementation concerns to the people
           | developing and running the service.
        
           | The_Colonel wrote:
           | This is an interesting example, becase telco has an actual
           | API standards committee (TM Forum). Telcos have decades of
           | experience and extremely well defined (and to a large degree
           | shared / interchangeable) domain model. It's an ideal
           | scenario for APIs.
           | 
           | Meanwhile your product companies each develop different
           | product, there's little standardization. API designers have
           | only vague idea what the API will be used for and how. Fast
           | evolution is important.
        
           | sgarland wrote:
           | > folks that won't keep the service running wanting to have a
           | say in our payload wanting us to change
           | 
           | This is my single biggest gripe with DevOps. If you're not
           | going to be fixing it, why on earth do you get a say as to
           | how I build it? It's nearly _always_ a one-way street, too -
           | when's the last time Ops successfully ordered Dev to change
           | some specific part of their code (modulo things like, "hey,
           | you really need to add a rate-limiter")?
        
           | rqtwteye wrote:
           | "Working with "product" companies without silos most of the
           | time it's design by committee,"
           | 
           | With siloes you may end up with design by ego trip.
        
         | acjohnson55 wrote:
         | As a coda to this, I think that we are entering a new world
         | where services truly can have minimal operational overhead.
         | 
         | At my last company, we were all in on serverless, with AWS
         | tools like SQS coupling things together. It worked extremely
         | well for keeping the architecture simple to operate and
         | approachable for new people.
         | 
         | But even better, I think we want logical services that
         | interface with each other as though they were in a single
         | process. We want the ability to write code as though it is
         | monolithic (e.g. lexical scope and native-language APIs), while
         | availing ourselves of the advantages of independently deployed
         | services. I believe projects like Temporal point the way. I
         | haven't had the opportunity to use it, but philosophically, I
         | think it's the right direction.
        
           | throw04323 wrote:
           | I don't have experience with it myself, but Polylith
           | architecture looks interesting. There you compose services
           | based on shared components. You can start developing it as a
           | monolith, and then extract components to separate services by
           | just changing the interface between components.
           | 
           | https://polylith.gitbook.io/polylith/
        
             | acjohnson55 wrote:
             | If I'm understanding Polylith, it seems like a way to do
             | monoglot monorepo in a way that avoids explicitly using
             | libraries.
             | 
             | That looks pretty interesting and similar to what Artsy did
             | with Ezel for front-end development.
             | 
             | https://github.com/artsy/ezel
        
           | jakjak123 wrote:
           | I have been working mostly with Kafka for five years, and I
           | found SQS to be a little weird to work with. How do you keep
           | an overview of all services that consume a queue? What if you
           | want multiple services to read the same data in the same
           | order? Granted I am very new to AWS.
        
             | 8note wrote:
             | I think you're really looking to use kinesis rather than
             | SQS.
             | 
             | SNS+SQS make a pub/sub setup, but every queue subscribed to
             | the event is fully separate, and you wouldn't expect to try
             | to couple the different listening queues together.
             | 
             | You could make all the queues into SQS fifo queues, and put
             | them all on the same sorting group, but I think network
             | time could still break the ordering on different
             | subscribers?
             | 
             | The simple way to use SQS is to have a queue for each
             | message type you plan to consume, and treat it as a bucket
             | of work to do, where you can control how fast you pull the
             | work out
        
           | osigurdson wrote:
           | >> as though they were in a single process
           | 
           | The problem is, you cannot abstract away a 3 order of
           | magnitude difference in performance and latency. This problem
           | increases with larger teams as people don't understand what
           | is happening. Abstractions shorten the learning curve but
           | also greatly slow the formation of an accurate mental model
           | of how a system works.
        
             | nostrebored wrote:
             | The mental model of using SQS and serverless is extremely
             | simple. You put a message in, get a message out, and have
             | your code run.
             | 
             | This is something I've been able to teach teams how to do
             | in a few hours. This is the a huge part of the value prop
             | of these services.
             | 
             | Performance and latency have a trade off, sure. But you get
             | teams who have complete ownership over their service. You
             | get services that can scale to millions of requests per
             | minute transparently. Some of the highest scale workloads I
             | know use microservices behind SQS.
        
         | toss1 wrote:
         | Yes APIs are expensive and require significant overhead.
         | 
         | But clean, maintained interfaces between separate domains are
         | far _LESS_ expensive than multi-domain /team mashups.
         | 
         | Maintaining separate teams _AND_ well- maintained interfaces
         | surfaces and makes _explicit_ the myriad inter-silo
         | interactions.
         | 
         | Instead of a quick backchannel conversation and a quick hack,
         | it is an explicit conversation and an API adjustment.
         | 
         | Yes, it's more expensive up front, but far cheaper and more
         | powerful in the long run.
         | 
         | So for high-speed prototyping, not so suitable- breaj the
         | silos, do the backchannel hack & move on (this version will
         | likely be binned or massively refactored anyway). But when the
         | domain is more stable, modular with solid interfaces is the way
         | to go.
        
         | jakjak123 wrote:
         | I'm not saying its a silver bullet, because sharing data via
         | database makes change management way worse.
        
         | 6gvONxR4sf7o wrote:
         | > My overall reaction is that this is a great piece for
         | products/teams that have reached significant scale, once the
         | job to be done is too big and complex for one team to own, end-
         | to-end, or there are truly reusable concerns that can be
         | separated from the core product (e.g. auth, observability).
         | 
         | I'd say it's critical in the small scale too. When you get down
         | to a single person size, this is just what well factored code
         | is: you write down little reasonably independent pieces that
         | can be learned about and thought about without having to think
         | about all the other things. After all, we've all had that
         | experience when you go back and revisit your code after
         | vacation or something, and it might as well have been written
         | by someone else.
         | 
         | I was on a tiny team not long ago where my teammates kept
         | writing tightly coupled systems, then rewriting everything from
         | scratch every time. It was hell. Our product moved slowly,
         | broke constantly, and we couldn't build off of our prior work,
         | and could barely even build off of each others' work, so
         | velocity stayed constant (read: slow).
         | 
         | (as a tangent, the communication patterns of remote work seem
         | to make this more important)
         | 
         | Siloing teams, siloing concerns, and writing modular code are
         | all kind of the same thing, just at different scales.
        
         | steveBK123 wrote:
         | > Because interfacing via API is expensive. Writing APIs for
         | others to use productively isn't easy and change management
         | also adds a lot of overhead.
         | 
         | An under appreciated point. Something that is affordable for
         | giant monopoly profit driven FAANGs, but harder in smaller
         | orgs.
         | 
         | Even within my team, the "level of difficulty" of writing an
         | API depends on its user base. If I am the primary user, it's
         | easy. There's one really sharp kind on my team, if he is going
         | to use it, then 2x the difficulty. If I need it to handle the
         | median dev on my team, then 2x again. By the time you get to
         | the below median dev, add another 2x. So even within my team,
         | the intended user base can change the difficult from 1x->8x.
         | 
         | What are some of the pain points - input validation that drives
         | useful informative errors, flexibility on inputs like having
         | sensible defaults to reduce the number of inputs users must
         | pass, performance/scaling, and edge cases.
         | 
         | >> In this sense, development teams weren't so much empowered
         | to deliver software faster; rather, they were over-encumbered
         | with infrastructure tasks that were outside of their expertise.
         | 
         | And agreed on this point. Amusingly I have had HN thread
         | arguments just this week with DevOps advocates telling me that
         | akshully their job isn't to empower devs, but some sort of
         | "tail that wags that dog" interpretation around devops
         | organizational / standardization / cost / etc.
        
       | agentultra wrote:
       | I have heard similar perspectives on this. That folks are moving
       | away from devops and towards platform engineering. The idea being
       | that a platform team reduces the friction to deploy code by
       | building self-serve APIs, libraries, and infrastructure to be
       | used by dev teams.
       | 
       | Even when in teams practicing devops well I have always known at
       | least one or two people who are great developers but they don't
       | want to know anything about operating systems, sockets, file
       | descriptors, service level objectives and all that. I always
       | found working with them to be challenging.
       | 
       | I'm very much in the, "you wrote it, you run it," camp.
       | 
       | While platform engineering sounds great I've also worked with
       | teams trying this and it has its own trade-offs: as demands on
       | the platform team grow it can take longer to wait for your change
       | requests to be deployed and depending on how ownership at the
       | company works.. it can be frustrating: you could have fixed it
       | yourself and shipped sooner but now you have to live within the
       | constraints set for you by the platform team. You also end up
       | with a development culture that has a hard time understanding
       | service performance objectives.
       | 
       | This can be a great thing for some companies for sure. But I
       | haven't seen a cure-all for siloing teams. Conway's Law and all.
        
         | 8organicbits wrote:
         | > as demands on the platform team grow it can take longer to
         | wait for your change requests to be deployed
         | 
         | That sounds like an ops or deployment team, not a platform
         | team. A key feature of a platform should be that the developers
         | choose when a deployment happens.
        
           | Dioxide2119 wrote:
           | > A key feature of a platform should be that the developers
           | choose when a deployment happens.
           | 
           | Agreed. When I was on a platform team we wrote tools to take
           | a process that used to be done by a deployment team (change a
           | DNS record was a helpdesk ticket) and move it into a self-
           | serve system (PR your desired DNS changes in, upon merge, the
           | system deploys the changes), which kept audit happy because
           | 'dev' wasn't touching 'prod' in the unfettered way SOC2
           | people stay up at night worrying about (even though Enron
           | happened because of bad managment not Office Space but
           | anyways), while still giving Devs effective control of when
           | and where they wanted to make production changes, whether
           | relatively ad-hoc or as part of a CI/CD pipeline.
           | 
           | Humans could approve the self-service PRs, or if a list of
           | in-code rules had been fulfilled, the PR would be auto
           | approved (and potentially even merged but everyone but us was
           | too afraid to set that part up).
        
             | gtirloni wrote:
             | I'd say that's a CI/CD team.
             | 
             | Platform means different things to different people but it
             | should offer a standardized way of doing things that's well
             | supported while allowing for customization by developers
             | when needed (with less support when you step out of the
             | golden path). It's a combination of software, processes and
             | management buyin.
             | 
             | The way I see most "DevOps" teams working is they're just
             | writing scripts that do whatever was requested without much
             | thought about how sustainable that is, how company-wide
             | policies can be enforced, or retrofitting improvements to
             | other codebases... It's all very quick-and-dirt solution,
             | one after the one until they end up in software engineering
             | madness with developers complaining that things take too
             | long or break easily while devops engineers complain
             | developers don't know what they are doing. It's not a
             | productive situation to be in.
             | 
             | I think platform engineering is just about having a
             | systematic approach that gives developers more peace of
             | mind so they can focus on actually coding features while
             | giving the rest of the organization a bit more control
             | points about how things are maintained. It's the 80/20 rule
             | applied to devops, I guess. Just enough centralization.
             | 
             | I'm also very excited about platform engineering and I
             | think it's a natural progression because, frankly, what
             | people call DevOps these days is just a nightmare. God
             | forbid the org has "distributed DevOps" a.k.a. do whatever
             | you want in your team and when it's time to make a global
             | change we will work with >20 different ways of doing
             | something. That will be quick.
        
               | agentultra wrote:
               | > what people call DevOps these days is just a nightmare.
               | 
               | I agree! It's taken on a, "you'll know it when you see
               | it," kind of definition and it's hard to pinpoint what
               | "DevOps" is and whether your organization is practising
               | it.
               | 
               | And so I've often seen it become a veneer for the
               | original "silos" it was meant to break down: those
               | handful of developers who still want nothing to do with
               | managing their code and services in production get to
               | throw code over the wall and someone else gets to hold
               | the pager and keep it running, make it fast, etc.
               | 
               | In other words, the company hires "devops" which becomes
               | a new title for "system administrator," and everything
               | stays the same.
               | 
               | Platform engineering, devops, it's all evolving... but
               | some things never seem to change.
        
         | yCombLinks wrote:
         | The problem with "You wrote it, you run it" is they are
         | completely orthogonal skillsets, and you lose the benefits of
         | specialization. It introduces a high amount of context
         | switching, and loses consistency in how services operate.
        
           | im3w1l wrote:
           | Sounds like the horizontal vs vertical integration debate all
           | over again.
        
           | dilyevsky wrote:
           | > It introduces a high amount of context switching
           | 
           | As opposed to having that context now shared between two
           | people on separate teams?
        
         | sgarland wrote:
         | > I have always known at least one or two people who are great
         | developers but they don't want to know anything about operating
         | systems, sockets, file descriptors, service level objectives
         | and all that
         | 
         | IMO, those are not great developers. If you know nothing about
         | how your code is running at a low level, you will reach a point
         | at some scale where you've caused performance issues, and don't
         | have the fundamental knowledge necessary to fix it. Skilled or
         | good, sure, but "great" implies mastery.
         | 
         | > I'm very much in the, "you wrote it, you run it," camp.
         | 
         | As a sibling comment of mine pointed out, these are generally
         | orthogonal skillsets. The main problem I've seen with it is
         | that due to the relative ease of XaaS, dev teams with little to
         | know knowledge of infra can indeed stand up an entire stack,
         | and it will work quite well at first.
         | 
         | Databases, for example, are remarkably fast at small scale,
         | even when your schema is horrible and your queries are sub-
         | optimal. But if you don't know the fundamentals (my first
         | point), you won't know that it's abnormal for a modern RDBMS to
         | take hundreds of milliseconds for a well-written query
         | (assuming it's not a cold read). You see that your latency is
         | "good enough," so you move on with the next task.
         | 
         | Then, when the service gets more popular, you hit the limits of
         | the DB, so you vertically scale (if it isn't already
         | "Serverless" and auto-scaling - barf). It isn't until the bill
         | is astronomical that someone will bother to ask why it is you
         | need a $10K/month DB.
        
       | PaulHoule wrote:
       | ... and some system for coining unique ids across all the silos,
       | whether that is UUIDs or something like RDF-style namespaces.
        
         | jon_richards wrote:
         | I was really hesitant to use uuids as primary keys because it's
         | basically worst case clustering performance. Ended up just
         | using 32 bit Unix time followed by 12 random bytes. Not sure
         | why that isn't an official version.
        
           | PaulHoule wrote:
           | If you insist on using a B-Tree index and random UUIDs that's
           | certainly the case. Many of the formulas used to generate
           | UUIDs in the past had terrible privacy implications: Office
           | 95 would fill documents with UUIDs generated based on MAC
           | addresses and timestamps so Office documents could be tracked
           | to particular machines until Microsoft changed this with
           | little fanfare.
        
       | pphysch wrote:
       | Nothing wrong with having clear boundaries and different owners
       | across high-quality data sources, but IME "silo" usually comes up
       | in the context of egregious data duplication and ambiguous
       | sources of truth.
        
       | 0xbadcafebee wrote:
       | An API is actually not good enough. It's the minimum you could
       | possibly have to allow communication. So the silos can "work
       | together" though that API, but problems then occur due to lack of
       | understanding of how each silo actually works under the hood.
       | 
       | Imagine a bunch of microservices built by different teams. They
       | just send each other their OpenAPI specs and some URIs. So they
       | start calling each other's APIs. Everything seems to work fine.
       | 
       | But wait. Are there limits on this API? How many calls, or how
       | much data can I send? What's the SLA on these transactions? What
       | happens to data, how it's stored, processed, backed up? If I send
       | X data to service Y, do I know service Z is going to get the same
       | data? If one of these services goes down, is everything going to
       | go down, and is that team staffed for 24/7 support? Do they even
       | know what to do when things are down? When everything does down,
       | how do these silos know which thing was the cause and who to
       | alert to fix it? Does it require multiple silos to fix?
       | 
       | All of that and much, much more, is deeper knowledge related to
       | the entire system, which is the inter-relation of all these silos
       | from top to bottom and sideways. The API doesn't solve these
       | problems or answer the questions. The API only tells you how to
       | do one thing.
       | 
       | The premise of silos is the idea that you don't need to know
       | anything else about the rest of the world but some tiny bit of
       | information. Well, reality says different.
       | 
       | DevOps is not about "merging teams", but _communication_ and
       | _collaboration_ between teams. They should understand each other
       | well, or at least make it much easier to discover the right
       | information in order to improve outcomes. You absolutely have to
       | have specialized teams where people have domain knowledge. But
       | you also need to provide the tools and practices that enable very
       | different teams to work together to build things right and solve
       | problems quickly.
       | 
       | Say you're building cars. A dealership's mechanics notice a belt
       | keeps rubbing on a cable or hose. That needs to be quickly
       | notified to the assembly people to see if it's an assembly
       | problem, and if not, it needs to be sent to the mechanical
       | engineers to address a potential design flaw. That all needs to
       | be done quickly, as soon as the problem is noticed, because cars
       | are shipping every day. The longer it takes for that whole loop
       | to complete, the more bad cars are shipped. By focusing on
       | improving the loop between all these different groups, you
       | improve business outcomes. But "an API" isn't going to do that.
       | 
       | That's why the idea of a "DevOps Engineer" is wrong. This isn't
       | an engineering problem. This is a business process problem. The
       | communication between teams is not an API, it is really
       | organizational structure and practice. Engineers noticed the
       | problem, and wanted to fix it, but they failed to use the
       | language of management. So instead people slapped "Engineer" on
       | the concept and everyone got confused.
        
       | tanseydavid wrote:
       | With two silos, wouldn't you need to have _at least two APIs_?
        
         | josh-sematic wrote:
         | No; one solo could provide an API that the other consumes, but
         | the producer consumes no APIs from the consumer.
        
       | whynotmaybe wrote:
       | Unless you work at a place where ops don't write code because
       | "it's the dev's job to write code", so everything is built and
       | done manually; and devs don't have access to any infrastructure
       | and don't bother about it, because "it's the ops' job".
       | 
       | And management of both sides agree with this vision.
       | 
       | "works on my machine" and "the issue must be with the code" are
       | the most used excuses by both sides when something fails.
       | 
       | The ops "api" works by email, but the response delay is usually
       | expressed in weeks.
       | 
       | Most memorable quote from that place : "Yes, a 4 month delay for
       | an answer about your new server might seem long"
        
         | slaymaker1907 wrote:
         | "We can have that new release deployed in about 6-8 weeks."
        
           | jahsome wrote:
           | I find myself these days on one of these ops teams. The lead
           | time is the same for deploying either a code or
           | infrastructure change, and probably closer to 3-4 months
           | here.
           | 
           | It's not an operational capacity or competency issue for this
           | org, it's the result of hours worth of "sync" or "review"
           | meetings with no discernible agenda, negotiating maintenance
           | windows, facilitating approvals from a dozen or more parties
           | who don't even comprehend what they're approving, and weeks
           | of manual acceptance testing.
           | 
           | On the other extreme, in past roles at different orgs, I've
           | been on teams doing multiple deployments to production every
           | day, both on the dev and ops sides.
           | 
           | I find it exhausting and soul crushing being completely
           | untrusted because of the mistakes made by people who left the
           | org years before I started.
        
       | everdrive wrote:
       | I have no comment on this article's specific claims, and I have
       | every reason to believe the author is both intelligent and
       | insightful.
       | 
       | That said, so many times in my professional life I run into the
       | same problem: people think some idea is a truism which broadly
       | applies to all or most situations. The problems faced are seldom
       | deeply understood, and so that truism is misapplied by people
       | attempting to follow the latest best practices.
       | 
       | This seems like a pernicious meta-problem related to
       | profitability and work resources. ie, people cannot deeply
       | understand all problems, and so they are content to make larger
       | errors in a number of places so long as "more work" is getting
       | done. I don't think this problem will ever disappear.
        
       | esafak wrote:
       | This is true for some definition of a silo, but to me the term
       | inherently implies difficulty in interfacing. If you can easily
       | pull information out or push it in, it's not a silo in the sense
       | people complain about.
       | 
       | So to me this not so much a correction of the definition, but a
       | technical resolution. There are several problems, though. The
       | owner of the silo may benefit from it. This is called "lock in".
       | If the company has any say, the silo owner should be properly
       | incentivized to ensure "good citizenship".
        
       | throwup238 wrote:
       | This is how it starts. First ChatGPT starts suggesting all these
       | bloggers "Silos are fine as long as they have APIs" articles and
       | "silo this, silo that". Soon enough the overton window has
       | shifted and we're talking about military silos and silos having
       | APIs in the same sentence.
       | 
       | Before you know it, GPTskyNet5 is firing nukes off using poorly
       | secured and thought-out webhooks setup by some random defense
       | contractor at missile silos because some DoE scrum master needed
       | a promotion.
       | 
       | This is the beginning of the end!
        
       | osigurdson wrote:
       | I'd suggest that silos are necessary as human communication
       | bandwidth is limited. Everyone talking to everyone, all the time
       | doesn't scale. The key is to create silos along natural "fault
       | lines" such that less communication is required.
        
       | solatic wrote:
       | APIs don't live in a vacuum. Sorry, but you can't just ship a
       | service that exposes an API and expect people to use it. It MUST
       | be documented. And no, your auto-generated Swagger/OpenAPI docs
       | don't cut the mustard.
       | 
       | If, as an executive, you expect the teams you set up to actually
       | be independent, then treat them like Product teams in their own
       | right. Set expectations to write Product-quality documentation,
       | at least as good as what you ship to customers. Hire internal
       | Product managers for those teams, who will learn how internal
       | customers use those APIs and what else they need to solve their
       | problems. Hire internal Marketing to ensure everybody else knows
       | that the APIs exist. Sound ridiculous? What, do you expect your
       | Engineering teams to have these skillsets already? If you don't
       | expect your company's product to succeed without Product and
       | Marketing then why, pray tell, would you expect your internal
       | products to be any different?
        
       | hayst4ck wrote:
       | Silos are not a problem. Leadership quality is a problem that
       | silos can greatly exacerbate.
       | 
       | Silos exacerbate two problems. The first problem is that there is
       | an area between two silo's which can be poorly owned or lack
       | stewards. The second problem is that the first manager above two
       | different silos is the de facto resolver of disputes and resource
       | application for and around those silos. This area often has no
       | advocate, because to advocate for resource application to that
       | area is to volunteer yourself. This area becomes a blind spot to
       | leadership through systematic neglect. In many cases this
       | leadership can be a checked out ex-google CTO who has never run
       | an org under resource constraints who isn't hungry because they
       | are already well off. Checked out Rest and Vest early employees
       | who, through the peter principle, end up in CTO positions can
       | also be extremely dangerous to organizations.
       | 
       | If you have poor leadership, you end up with two silos that don't
       | want extra responsibility, because more responsibility without
       | more resources is a losing prospect. Under bad leadership,
       | anything that is not feature production is not rewarded. The end
       | result is that each silo becomes aligned against other silos,
       | rather than aligned to a business goal.
       | 
       | Defining an API seems like the primary goal is to completely
       | remove the area between two silos, thus alleviating the problem
       | of opaque ownership. I agree. Explicit ownership for all
       | artifacts helps an organization run much more effectively.
       | 
       | The real problem is a culture where every individual employee
       | does not feel responsible for business outcomes with
       | proportionate recognition by leadership of taken responsibility
       | for those outcomes.
       | 
       | Here is Admiral Rickover's take on culture:
       | 
       | > Professionalism occurs when individuals act in the best
       | interest of those being served according to objective values and
       | ethical norms, even when an action is perceived to not be in the
       | best interest of the individual or their organization. That is,
       | there are times when professionals must sacrifice their own
       | interest (or that of their organization) to meet the objective
       | values and ethical norms of the profession. Professionals, in
       | this sense, are serving something greater than the bureaucratic
       | organization that employs them.
       | 
       | > If Admiral Rickover had a mantra to shape a professional
       | culture, it would have been, "I am personally responsible."
       | 
       | When leadership does not practice personal responsibility or
       | engages in blame, that ripples through the entire organization.
       | Silos that don't function well are symptoms of leadership failing
       | to take responsibility and cultural failure, not necessarily
       | structural failure.
        
       ___________________________________________________________________
       (page generated 2024-01-19 23:00 UTC)