[HN Gopher] Silos are fine, as long as there is an API between them
___________________________________________________________________
Silos are fine, as long as there is an API between them
Author : platformeng
Score : 120 points
Date : 2024-01-19 13:32 UTC (9 hours ago)
(HTM) web link (fernandovillalba.substack.com)
(TXT) w3m dump (fernandovillalba.substack.com)
| internetter wrote:
| I thought this was going to be about siloed products. Like AWS is
| a silo, because once you use AWS you are effectively locked in,
| whereas WinterCG and Unix standards make implementors non-siloed.
|
| Oh well, this take would be a lot more spicy if so
| bazil376 wrote:
| I prefer my takes lukewarm
| alejoar wrote:
| New management in the last company I worked for went on a crusade
| against silos.
|
| Their strategy was to randomly shuffle people across teams.
| Literally random.
|
| You could tell how productivity bottomed and soon after a lot of
| senior engineers left the company.
|
| This is a multimillion american media company.
| cdavid wrote:
| silos is one of this keyword that is org health dependent.
|
| When things go well, people highlight team autonomy. For the
| exact same setup, when things go bad, people talk about silos.
| bigbillheck wrote:
| Mandating that is a terrible idea but I think giving senior
| people (and go-getting junior ones) in team A an opportunity to
| do a 6-12 month rotation in team B, and so on, is a great idea.
| braza wrote:
| I can resonate with the author, giving the context working on
| Scaleups and Big companies, where the coordination and
| communication plays a very heavy toil on the actual work.
|
| I do not believe in this model of "big collaboration" with folks
| swarming around a problem and each one with partial context
| trying to fix something that structurally has a huge
| communication, coordination and technical coupling.
|
| The best work experiences that I had was in teams that we
| establish the APIs, S3 buckets where the processed batch files
| will stay an maybe a e-mail list where someone will reply if
| something change.
| blastro wrote:
| First two sentences describe my current situation. How to
| improve it?
| braza wrote:
| I do not have a straight answer, but one thing that I saw
| working is the concept of "Good fences" where folks still act
| as a team but has clear touch points related with problems
| and when things goes south everyone knows what to do.
| blastro wrote:
| Thanks for your input!
| platformeng wrote:
| Not saying these will solve your problems, but it may help
| give it a little perspective at least:
|
| https://www.opslevel.com/resources/optimizing-engineering-
| co...
|
| https://fernandovillalba.substack.com/p/improving-
| engineerin...
| blastro wrote:
| Thank you for this
| candiddevmike wrote:
| This is one of those utopian engineer fallacies that is on the
| same level as reliable networking and fsync. People don't work
| like computers and don't respond like API endpoints. There will
| always be backchannels, backburners, pidgeonholes, and all the
| other political games humans play happening even with an "API
| between them".
|
| The problem is always management, and the solution is never "more
| management", IMO.
| dacryn wrote:
| an API does not mean there are no hidden dependencies, and that
| is often the biggest failure.
|
| If an API call is not stateless, or requires to be chained with
| other calls, you're in for a world of pain in the long run
| nostrebored wrote:
| But this is equivalent to saying "a monolith can never work
| because it's highly coupled". In both cases you need to
| follow best practices to make things work. API design and
| alignment with consumers of the API is table stakes.
| DanielHB wrote:
| The whole point of the article is that silos are not
| intrinsically bad, partially because silos reduces
| communication (and therefor managers) required
|
| Two teams agreeing on an API between themselves instead of one
| mega team fulfilling the needs of several client teams
| throw04323 wrote:
| > Two teams agreeing on an API between themselves
|
| I think that depends on the type of service that the team
| provides. If you have a central team that many other teams
| interact with, they risk becoming a bottleneck. They may not
| be interested in maintaining custom APIs for each team
| interaction and you will need to agree on a contract that all
| can live with.
|
| Another risk is that the team providing the service also have
| their own backlog, including work they want to do themselves
| and requests from other teams. This can cause unwanted
| dependencies and delays where managers try to fight to be
| prioritized on the expense of others.
| packetlost wrote:
| All I'm saying is it appears to have worked at mega-scale
| for Amazon.
| acjohnson55 wrote:
| Most of us don't have mega-scale problems, though. A
| tremendous amount of waste has been created by applying
| FAANG tech and processes to completely different
| contexts.
| packetlost wrote:
| Sure, but scaling of an organization is _not_ the same as
| scaling for traffic in a technical sense. There are so
| many companies that employ comparable numbers of
| engineers that are not big tech companies.
| throw04323 wrote:
| I'm not saying it can't work, but that there are risks
| involved.
|
| I have worked for several companies ranging from local
| startups to global enterprise (not FAANG). Each company
| tried the silo approach when they migrated to micro
| services and it caused significant delays and
| dependencies. They would have been better off if they
| focused more on larger domain services with fewer
| external dependencies.
|
| I am open to the idea that Amazon has been able to avoid
| these problems, but it's clearly not a silver bullet.
|
| In general I have to say I'm sceptical about comparisons
| with FAANG, because they live in a completely separate
| part of the technology sector. They have income similar
| to small countries and can live with inefficiencies that
| can break a startup.
| randomdata wrote:
| _> Each company tried the silo approach when they
| migrated to micro services_
|
| Doesn't that go without saying? That's literally what
| micro services is: The siloing of services, just as
| service is provided in the macro economy, but within the
| micro economy of a single organization. Without silos,
| your service is monolithic.
| DanielHB wrote:
| The problem I see are big companies with several products
| trying to break down silos between the products to share
| some infrastructure (be it code, libs, actual cloud
| infra, support teams, design systems etc) when there is
| very little overlap between the different products.
|
| All in some grand hope of reducing costs by sharing
| things. It almost always ends with overly generic
| solutions that are harder to use, takes more people to
| support, can't be fitted well in most cases and that
| everyone involved hates (causing employee attrition).
|
| This is different from having cohesive architecture
| within a single product.
| marcosdumay wrote:
| Yes, it scales up very well.
|
| People upthread are trying to say it scales down badly.
| pixl97 wrote:
| > appears to have worked
|
| Mostly because someone at a higher level said "Your APIs
| are not a silo, and if you act like they are you will be
| terminated".
|
| The communication cost will always be there, the question
| is one of how is it implemented. In the case of an API it
| tends to reduce the communication costs when someone is
| forcing all teams at gunpoint to write clear, concise,
| and well documented APIs and don't allow them to change
| said APIs without clear, concise, and well documented
| rules.
|
| I've worked with teams that communicated via API and
| started randomly changing shit without proper
| documentation, and without management being held to the
| fire over their actions, it's just a new type of silo.
| dkarl wrote:
| Agreeing means the providing team understanding and meeting
| the needs of the consuming team. Teams that can work together
| to accomplish this wouldn't be called "silos."
|
| However, when two teams work together to create an API, or a
| process or some other self-service mechanism, sometimes the
| API so good that the teams no longer need to talk to each
| other. The practices and relationships that enabled
| communication fade away. Walls go up, silos form, but nobody
| notices anything wrong, because it seems like efficiency is
| getting better and better. Over time, though, people start to
| notice that projects that encounter a need to change the API
| always fail. The API has become legacy, baggage, a problem.
|
| There may still be somebody on the providing team who
| remembers that they used to get together in the same room
| with developers from the consuming team to come up with
| solutions together, and they'll naively suggest that as a
| solution, but there are now too many assumptions baked in at
| the management level for that to be allowed to happen. A
| change in the working relationship between teams means the
| managers will fight over what this means for different
| managers' prestige. Somebody's cheese is going to get moved.
| Managers gear up to go to war over things like that, so upper
| management dictates a solution that minimizes inter-manager
| violence, a solution that carefully circumscribes the kinds
| and amounts of contact that members of the two teams are
| allowed to have. Voila: silos with windows, and engineers
| sitting in their respective windows forlornly waving at each
| other like lovers separated by their parents.
| nine_zeros wrote:
| I have seen managers create task funnels for requests coming from
| other teams. They assume this has fixed all coordination issues.
|
| Later I see engineers having to engage week-after-week out to
| other teams for coordination. Things are not in place. People
| don't respond, coordination "APIs" don't work as documented. I
| have seen individuals blamed for dependency failures that they
| had no part in.
|
| I rarely see the obvious solution: Why don't managers spend day-
| in-day-out coordinating work across teams/orgs/corporate
| barriers?
|
| Instead of doing coordination, managers seem to be spending more
| time doing HR stuff like vacations, performance calibrations,
| documenting ways to blame engineers. They create such APIs and
| think their job is done. Why? How? What a severe deadweight in
| the company.
| jkoberg wrote:
| there is a difference between silos and domains. It's great for
| domains to be separated, and independently well-defined. The
| silos I have seen cause problems are between organizational
| functions, like "Product" and engineering or "The Business" and
| implementation teams.
| acjohnson55 wrote:
| My overall reaction is that this is a great piece for
| products/teams that have reached significant scale, once the job
| to be done is too big and complex for one team to own, end-to-
| end, or there are truly reusable concerns that can be separated
| from the core product (e.g. auth, observability).
|
| Because interfacing via API is expensive. Writing APIs for others
| to use productively isn't easy and change management also adds a
| lot of overhead. And if we're talking about network APIs, there's
| a ton of distributed systems complexity to account for.
|
| > The problem with the DevOps movement is that it ended up taking
| "shifting left" to the extreme. In this sense, development teams
| weren't so much empowered to deliver software faster; rather,
| they were over-encumbered with infrastructure tasks that were
| outside of their expertise.
|
| This. In truth, I think this is a major misinterpretation of
| DevOps, which is meant to empower devs without loading them down
| with incidental complexity. But I experienced exactly this
| misinterpretation at the first place I worked that had embraced
| DevOps culture.
| braza wrote:
| > Because interfacing via API is expensive. Writing APIs for
| others to use productively isn't easy and change management
| also adds a lot of overhead.
|
| I agree in principle, but there is a lot of "unseen
| coordination/communication" costs that it's easy taken for
| granted.
|
| When I was working on telecom doing interfacing with carriers
| (e.g T-Mobile, Verizon, etc.) on thing that I noticed was how
| simple was to work with those folks: Ok, this is the standard
| XML, those are the endpoints, that's the list of error codes,
| the rate limit is X requests per second, a bunch of files will
| be on this FTP at 5AM daily basis, and if you face more than
| 100ms latency from our side just call this number.
|
| Working with "product" companies without silos most of the time
| it's design by committee, folks that won't keep the service
| running wanting to have a say in our payload wanting us to
| change our overly reliable RabbitMQ to use their Kafka.
| acjohnson55 wrote:
| > When I was working on telecom doing interfacing with
| carriers (e.g T-Mobile, Verizon, etc.) on thing that I
| noticed was how simple was to work with those folks: Ok, this
| is the standard XML, those are the endpoints, that's the list
| of error codes, the rate limit is X requests per second, a
| bunch of files will be on this FTP at 5AM daily basis, and if
| you face more than 100ms latency from our side just call this
| number.
|
| To me, that probably reflects the maturity of the services
| the carriers provide. And presumably that there's an explicit
| customer-producer relationship? These things justify the
| complexity of maintaining a well curated and operated API.
|
| > Working with "product" companies without silos most of the
| time it's design by committee, folks that won't keep the
| service running wanting to have a say in our payload wanting
| us to change our overly reliable RabbitMQ to use their Kafka.
|
| If I understand what you're saying, you've experienced
| platform people telling you what tech to use, without having
| real skin in the game for operating your services? If so,
| that sounds very irritating. To me, a truly silo-less
| approach would not have that.
|
| To the extent that there are platform teams with a say in
| architecture, I think they should develop requirements around
| the external characteristics of the deliverable (performance,
| cost, observability, contract with other teams, etc) and
| largely leave the implementation concerns to the people
| developing and running the service.
| The_Colonel wrote:
| This is an interesting example, becase telco has an actual
| API standards committee (TM Forum). Telcos have decades of
| experience and extremely well defined (and to a large degree
| shared / interchangeable) domain model. It's an ideal
| scenario for APIs.
|
| Meanwhile your product companies each develop different
| product, there's little standardization. API designers have
| only vague idea what the API will be used for and how. Fast
| evolution is important.
| sgarland wrote:
| > folks that won't keep the service running wanting to have a
| say in our payload wanting us to change
|
| This is my single biggest gripe with DevOps. If you're not
| going to be fixing it, why on earth do you get a say as to
| how I build it? It's nearly _always_ a one-way street, too -
| when's the last time Ops successfully ordered Dev to change
| some specific part of their code (modulo things like, "hey,
| you really need to add a rate-limiter")?
| rqtwteye wrote:
| "Working with "product" companies without silos most of the
| time it's design by committee,"
|
| With siloes you may end up with design by ego trip.
| acjohnson55 wrote:
| As a coda to this, I think that we are entering a new world
| where services truly can have minimal operational overhead.
|
| At my last company, we were all in on serverless, with AWS
| tools like SQS coupling things together. It worked extremely
| well for keeping the architecture simple to operate and
| approachable for new people.
|
| But even better, I think we want logical services that
| interface with each other as though they were in a single
| process. We want the ability to write code as though it is
| monolithic (e.g. lexical scope and native-language APIs), while
| availing ourselves of the advantages of independently deployed
| services. I believe projects like Temporal point the way. I
| haven't had the opportunity to use it, but philosophically, I
| think it's the right direction.
| throw04323 wrote:
| I don't have experience with it myself, but Polylith
| architecture looks interesting. There you compose services
| based on shared components. You can start developing it as a
| monolith, and then extract components to separate services by
| just changing the interface between components.
|
| https://polylith.gitbook.io/polylith/
| acjohnson55 wrote:
| If I'm understanding Polylith, it seems like a way to do
| monoglot monorepo in a way that avoids explicitly using
| libraries.
|
| That looks pretty interesting and similar to what Artsy did
| with Ezel for front-end development.
|
| https://github.com/artsy/ezel
| jakjak123 wrote:
| I have been working mostly with Kafka for five years, and I
| found SQS to be a little weird to work with. How do you keep
| an overview of all services that consume a queue? What if you
| want multiple services to read the same data in the same
| order? Granted I am very new to AWS.
| 8note wrote:
| I think you're really looking to use kinesis rather than
| SQS.
|
| SNS+SQS make a pub/sub setup, but every queue subscribed to
| the event is fully separate, and you wouldn't expect to try
| to couple the different listening queues together.
|
| You could make all the queues into SQS fifo queues, and put
| them all on the same sorting group, but I think network
| time could still break the ordering on different
| subscribers?
|
| The simple way to use SQS is to have a queue for each
| message type you plan to consume, and treat it as a bucket
| of work to do, where you can control how fast you pull the
| work out
| osigurdson wrote:
| >> as though they were in a single process
|
| The problem is, you cannot abstract away a 3 order of
| magnitude difference in performance and latency. This problem
| increases with larger teams as people don't understand what
| is happening. Abstractions shorten the learning curve but
| also greatly slow the formation of an accurate mental model
| of how a system works.
| nostrebored wrote:
| The mental model of using SQS and serverless is extremely
| simple. You put a message in, get a message out, and have
| your code run.
|
| This is something I've been able to teach teams how to do
| in a few hours. This is the a huge part of the value prop
| of these services.
|
| Performance and latency have a trade off, sure. But you get
| teams who have complete ownership over their service. You
| get services that can scale to millions of requests per
| minute transparently. Some of the highest scale workloads I
| know use microservices behind SQS.
| toss1 wrote:
| Yes APIs are expensive and require significant overhead.
|
| But clean, maintained interfaces between separate domains are
| far _LESS_ expensive than multi-domain /team mashups.
|
| Maintaining separate teams _AND_ well- maintained interfaces
| surfaces and makes _explicit_ the myriad inter-silo
| interactions.
|
| Instead of a quick backchannel conversation and a quick hack,
| it is an explicit conversation and an API adjustment.
|
| Yes, it's more expensive up front, but far cheaper and more
| powerful in the long run.
|
| So for high-speed prototyping, not so suitable- breaj the
| silos, do the backchannel hack & move on (this version will
| likely be binned or massively refactored anyway). But when the
| domain is more stable, modular with solid interfaces is the way
| to go.
| jakjak123 wrote:
| I'm not saying its a silver bullet, because sharing data via
| database makes change management way worse.
| 6gvONxR4sf7o wrote:
| > My overall reaction is that this is a great piece for
| products/teams that have reached significant scale, once the
| job to be done is too big and complex for one team to own, end-
| to-end, or there are truly reusable concerns that can be
| separated from the core product (e.g. auth, observability).
|
| I'd say it's critical in the small scale too. When you get down
| to a single person size, this is just what well factored code
| is: you write down little reasonably independent pieces that
| can be learned about and thought about without having to think
| about all the other things. After all, we've all had that
| experience when you go back and revisit your code after
| vacation or something, and it might as well have been written
| by someone else.
|
| I was on a tiny team not long ago where my teammates kept
| writing tightly coupled systems, then rewriting everything from
| scratch every time. It was hell. Our product moved slowly,
| broke constantly, and we couldn't build off of our prior work,
| and could barely even build off of each others' work, so
| velocity stayed constant (read: slow).
|
| (as a tangent, the communication patterns of remote work seem
| to make this more important)
|
| Siloing teams, siloing concerns, and writing modular code are
| all kind of the same thing, just at different scales.
| steveBK123 wrote:
| > Because interfacing via API is expensive. Writing APIs for
| others to use productively isn't easy and change management
| also adds a lot of overhead.
|
| An under appreciated point. Something that is affordable for
| giant monopoly profit driven FAANGs, but harder in smaller
| orgs.
|
| Even within my team, the "level of difficulty" of writing an
| API depends on its user base. If I am the primary user, it's
| easy. There's one really sharp kind on my team, if he is going
| to use it, then 2x the difficulty. If I need it to handle the
| median dev on my team, then 2x again. By the time you get to
| the below median dev, add another 2x. So even within my team,
| the intended user base can change the difficult from 1x->8x.
|
| What are some of the pain points - input validation that drives
| useful informative errors, flexibility on inputs like having
| sensible defaults to reduce the number of inputs users must
| pass, performance/scaling, and edge cases.
|
| >> In this sense, development teams weren't so much empowered
| to deliver software faster; rather, they were over-encumbered
| with infrastructure tasks that were outside of their expertise.
|
| And agreed on this point. Amusingly I have had HN thread
| arguments just this week with DevOps advocates telling me that
| akshully their job isn't to empower devs, but some sort of
| "tail that wags that dog" interpretation around devops
| organizational / standardization / cost / etc.
| agentultra wrote:
| I have heard similar perspectives on this. That folks are moving
| away from devops and towards platform engineering. The idea being
| that a platform team reduces the friction to deploy code by
| building self-serve APIs, libraries, and infrastructure to be
| used by dev teams.
|
| Even when in teams practicing devops well I have always known at
| least one or two people who are great developers but they don't
| want to know anything about operating systems, sockets, file
| descriptors, service level objectives and all that. I always
| found working with them to be challenging.
|
| I'm very much in the, "you wrote it, you run it," camp.
|
| While platform engineering sounds great I've also worked with
| teams trying this and it has its own trade-offs: as demands on
| the platform team grow it can take longer to wait for your change
| requests to be deployed and depending on how ownership at the
| company works.. it can be frustrating: you could have fixed it
| yourself and shipped sooner but now you have to live within the
| constraints set for you by the platform team. You also end up
| with a development culture that has a hard time understanding
| service performance objectives.
|
| This can be a great thing for some companies for sure. But I
| haven't seen a cure-all for siloing teams. Conway's Law and all.
| 8organicbits wrote:
| > as demands on the platform team grow it can take longer to
| wait for your change requests to be deployed
|
| That sounds like an ops or deployment team, not a platform
| team. A key feature of a platform should be that the developers
| choose when a deployment happens.
| Dioxide2119 wrote:
| > A key feature of a platform should be that the developers
| choose when a deployment happens.
|
| Agreed. When I was on a platform team we wrote tools to take
| a process that used to be done by a deployment team (change a
| DNS record was a helpdesk ticket) and move it into a self-
| serve system (PR your desired DNS changes in, upon merge, the
| system deploys the changes), which kept audit happy because
| 'dev' wasn't touching 'prod' in the unfettered way SOC2
| people stay up at night worrying about (even though Enron
| happened because of bad managment not Office Space but
| anyways), while still giving Devs effective control of when
| and where they wanted to make production changes, whether
| relatively ad-hoc or as part of a CI/CD pipeline.
|
| Humans could approve the self-service PRs, or if a list of
| in-code rules had been fulfilled, the PR would be auto
| approved (and potentially even merged but everyone but us was
| too afraid to set that part up).
| gtirloni wrote:
| I'd say that's a CI/CD team.
|
| Platform means different things to different people but it
| should offer a standardized way of doing things that's well
| supported while allowing for customization by developers
| when needed (with less support when you step out of the
| golden path). It's a combination of software, processes and
| management buyin.
|
| The way I see most "DevOps" teams working is they're just
| writing scripts that do whatever was requested without much
| thought about how sustainable that is, how company-wide
| policies can be enforced, or retrofitting improvements to
| other codebases... It's all very quick-and-dirt solution,
| one after the one until they end up in software engineering
| madness with developers complaining that things take too
| long or break easily while devops engineers complain
| developers don't know what they are doing. It's not a
| productive situation to be in.
|
| I think platform engineering is just about having a
| systematic approach that gives developers more peace of
| mind so they can focus on actually coding features while
| giving the rest of the organization a bit more control
| points about how things are maintained. It's the 80/20 rule
| applied to devops, I guess. Just enough centralization.
|
| I'm also very excited about platform engineering and I
| think it's a natural progression because, frankly, what
| people call DevOps these days is just a nightmare. God
| forbid the org has "distributed DevOps" a.k.a. do whatever
| you want in your team and when it's time to make a global
| change we will work with >20 different ways of doing
| something. That will be quick.
| agentultra wrote:
| > what people call DevOps these days is just a nightmare.
|
| I agree! It's taken on a, "you'll know it when you see
| it," kind of definition and it's hard to pinpoint what
| "DevOps" is and whether your organization is practising
| it.
|
| And so I've often seen it become a veneer for the
| original "silos" it was meant to break down: those
| handful of developers who still want nothing to do with
| managing their code and services in production get to
| throw code over the wall and someone else gets to hold
| the pager and keep it running, make it fast, etc.
|
| In other words, the company hires "devops" which becomes
| a new title for "system administrator," and everything
| stays the same.
|
| Platform engineering, devops, it's all evolving... but
| some things never seem to change.
| yCombLinks wrote:
| The problem with "You wrote it, you run it" is they are
| completely orthogonal skillsets, and you lose the benefits of
| specialization. It introduces a high amount of context
| switching, and loses consistency in how services operate.
| im3w1l wrote:
| Sounds like the horizontal vs vertical integration debate all
| over again.
| dilyevsky wrote:
| > It introduces a high amount of context switching
|
| As opposed to having that context now shared between two
| people on separate teams?
| sgarland wrote:
| > I have always known at least one or two people who are great
| developers but they don't want to know anything about operating
| systems, sockets, file descriptors, service level objectives
| and all that
|
| IMO, those are not great developers. If you know nothing about
| how your code is running at a low level, you will reach a point
| at some scale where you've caused performance issues, and don't
| have the fundamental knowledge necessary to fix it. Skilled or
| good, sure, but "great" implies mastery.
|
| > I'm very much in the, "you wrote it, you run it," camp.
|
| As a sibling comment of mine pointed out, these are generally
| orthogonal skillsets. The main problem I've seen with it is
| that due to the relative ease of XaaS, dev teams with little to
| know knowledge of infra can indeed stand up an entire stack,
| and it will work quite well at first.
|
| Databases, for example, are remarkably fast at small scale,
| even when your schema is horrible and your queries are sub-
| optimal. But if you don't know the fundamentals (my first
| point), you won't know that it's abnormal for a modern RDBMS to
| take hundreds of milliseconds for a well-written query
| (assuming it's not a cold read). You see that your latency is
| "good enough," so you move on with the next task.
|
| Then, when the service gets more popular, you hit the limits of
| the DB, so you vertically scale (if it isn't already
| "Serverless" and auto-scaling - barf). It isn't until the bill
| is astronomical that someone will bother to ask why it is you
| need a $10K/month DB.
| PaulHoule wrote:
| ... and some system for coining unique ids across all the silos,
| whether that is UUIDs or something like RDF-style namespaces.
| jon_richards wrote:
| I was really hesitant to use uuids as primary keys because it's
| basically worst case clustering performance. Ended up just
| using 32 bit Unix time followed by 12 random bytes. Not sure
| why that isn't an official version.
| PaulHoule wrote:
| If you insist on using a B-Tree index and random UUIDs that's
| certainly the case. Many of the formulas used to generate
| UUIDs in the past had terrible privacy implications: Office
| 95 would fill documents with UUIDs generated based on MAC
| addresses and timestamps so Office documents could be tracked
| to particular machines until Microsoft changed this with
| little fanfare.
| pphysch wrote:
| Nothing wrong with having clear boundaries and different owners
| across high-quality data sources, but IME "silo" usually comes up
| in the context of egregious data duplication and ambiguous
| sources of truth.
| 0xbadcafebee wrote:
| An API is actually not good enough. It's the minimum you could
| possibly have to allow communication. So the silos can "work
| together" though that API, but problems then occur due to lack of
| understanding of how each silo actually works under the hood.
|
| Imagine a bunch of microservices built by different teams. They
| just send each other their OpenAPI specs and some URIs. So they
| start calling each other's APIs. Everything seems to work fine.
|
| But wait. Are there limits on this API? How many calls, or how
| much data can I send? What's the SLA on these transactions? What
| happens to data, how it's stored, processed, backed up? If I send
| X data to service Y, do I know service Z is going to get the same
| data? If one of these services goes down, is everything going to
| go down, and is that team staffed for 24/7 support? Do they even
| know what to do when things are down? When everything does down,
| how do these silos know which thing was the cause and who to
| alert to fix it? Does it require multiple silos to fix?
|
| All of that and much, much more, is deeper knowledge related to
| the entire system, which is the inter-relation of all these silos
| from top to bottom and sideways. The API doesn't solve these
| problems or answer the questions. The API only tells you how to
| do one thing.
|
| The premise of silos is the idea that you don't need to know
| anything else about the rest of the world but some tiny bit of
| information. Well, reality says different.
|
| DevOps is not about "merging teams", but _communication_ and
| _collaboration_ between teams. They should understand each other
| well, or at least make it much easier to discover the right
| information in order to improve outcomes. You absolutely have to
| have specialized teams where people have domain knowledge. But
| you also need to provide the tools and practices that enable very
| different teams to work together to build things right and solve
| problems quickly.
|
| Say you're building cars. A dealership's mechanics notice a belt
| keeps rubbing on a cable or hose. That needs to be quickly
| notified to the assembly people to see if it's an assembly
| problem, and if not, it needs to be sent to the mechanical
| engineers to address a potential design flaw. That all needs to
| be done quickly, as soon as the problem is noticed, because cars
| are shipping every day. The longer it takes for that whole loop
| to complete, the more bad cars are shipped. By focusing on
| improving the loop between all these different groups, you
| improve business outcomes. But "an API" isn't going to do that.
|
| That's why the idea of a "DevOps Engineer" is wrong. This isn't
| an engineering problem. This is a business process problem. The
| communication between teams is not an API, it is really
| organizational structure and practice. Engineers noticed the
| problem, and wanted to fix it, but they failed to use the
| language of management. So instead people slapped "Engineer" on
| the concept and everyone got confused.
| tanseydavid wrote:
| With two silos, wouldn't you need to have _at least two APIs_?
| josh-sematic wrote:
| No; one solo could provide an API that the other consumes, but
| the producer consumes no APIs from the consumer.
| whynotmaybe wrote:
| Unless you work at a place where ops don't write code because
| "it's the dev's job to write code", so everything is built and
| done manually; and devs don't have access to any infrastructure
| and don't bother about it, because "it's the ops' job".
|
| And management of both sides agree with this vision.
|
| "works on my machine" and "the issue must be with the code" are
| the most used excuses by both sides when something fails.
|
| The ops "api" works by email, but the response delay is usually
| expressed in weeks.
|
| Most memorable quote from that place : "Yes, a 4 month delay for
| an answer about your new server might seem long"
| slaymaker1907 wrote:
| "We can have that new release deployed in about 6-8 weeks."
| jahsome wrote:
| I find myself these days on one of these ops teams. The lead
| time is the same for deploying either a code or
| infrastructure change, and probably closer to 3-4 months
| here.
|
| It's not an operational capacity or competency issue for this
| org, it's the result of hours worth of "sync" or "review"
| meetings with no discernible agenda, negotiating maintenance
| windows, facilitating approvals from a dozen or more parties
| who don't even comprehend what they're approving, and weeks
| of manual acceptance testing.
|
| On the other extreme, in past roles at different orgs, I've
| been on teams doing multiple deployments to production every
| day, both on the dev and ops sides.
|
| I find it exhausting and soul crushing being completely
| untrusted because of the mistakes made by people who left the
| org years before I started.
| everdrive wrote:
| I have no comment on this article's specific claims, and I have
| every reason to believe the author is both intelligent and
| insightful.
|
| That said, so many times in my professional life I run into the
| same problem: people think some idea is a truism which broadly
| applies to all or most situations. The problems faced are seldom
| deeply understood, and so that truism is misapplied by people
| attempting to follow the latest best practices.
|
| This seems like a pernicious meta-problem related to
| profitability and work resources. ie, people cannot deeply
| understand all problems, and so they are content to make larger
| errors in a number of places so long as "more work" is getting
| done. I don't think this problem will ever disappear.
| esafak wrote:
| This is true for some definition of a silo, but to me the term
| inherently implies difficulty in interfacing. If you can easily
| pull information out or push it in, it's not a silo in the sense
| people complain about.
|
| So to me this not so much a correction of the definition, but a
| technical resolution. There are several problems, though. The
| owner of the silo may benefit from it. This is called "lock in".
| If the company has any say, the silo owner should be properly
| incentivized to ensure "good citizenship".
| throwup238 wrote:
| This is how it starts. First ChatGPT starts suggesting all these
| bloggers "Silos are fine as long as they have APIs" articles and
| "silo this, silo that". Soon enough the overton window has
| shifted and we're talking about military silos and silos having
| APIs in the same sentence.
|
| Before you know it, GPTskyNet5 is firing nukes off using poorly
| secured and thought-out webhooks setup by some random defense
| contractor at missile silos because some DoE scrum master needed
| a promotion.
|
| This is the beginning of the end!
| osigurdson wrote:
| I'd suggest that silos are necessary as human communication
| bandwidth is limited. Everyone talking to everyone, all the time
| doesn't scale. The key is to create silos along natural "fault
| lines" such that less communication is required.
| solatic wrote:
| APIs don't live in a vacuum. Sorry, but you can't just ship a
| service that exposes an API and expect people to use it. It MUST
| be documented. And no, your auto-generated Swagger/OpenAPI docs
| don't cut the mustard.
|
| If, as an executive, you expect the teams you set up to actually
| be independent, then treat them like Product teams in their own
| right. Set expectations to write Product-quality documentation,
| at least as good as what you ship to customers. Hire internal
| Product managers for those teams, who will learn how internal
| customers use those APIs and what else they need to solve their
| problems. Hire internal Marketing to ensure everybody else knows
| that the APIs exist. Sound ridiculous? What, do you expect your
| Engineering teams to have these skillsets already? If you don't
| expect your company's product to succeed without Product and
| Marketing then why, pray tell, would you expect your internal
| products to be any different?
| hayst4ck wrote:
| Silos are not a problem. Leadership quality is a problem that
| silos can greatly exacerbate.
|
| Silos exacerbate two problems. The first problem is that there is
| an area between two silo's which can be poorly owned or lack
| stewards. The second problem is that the first manager above two
| different silos is the de facto resolver of disputes and resource
| application for and around those silos. This area often has no
| advocate, because to advocate for resource application to that
| area is to volunteer yourself. This area becomes a blind spot to
| leadership through systematic neglect. In many cases this
| leadership can be a checked out ex-google CTO who has never run
| an org under resource constraints who isn't hungry because they
| are already well off. Checked out Rest and Vest early employees
| who, through the peter principle, end up in CTO positions can
| also be extremely dangerous to organizations.
|
| If you have poor leadership, you end up with two silos that don't
| want extra responsibility, because more responsibility without
| more resources is a losing prospect. Under bad leadership,
| anything that is not feature production is not rewarded. The end
| result is that each silo becomes aligned against other silos,
| rather than aligned to a business goal.
|
| Defining an API seems like the primary goal is to completely
| remove the area between two silos, thus alleviating the problem
| of opaque ownership. I agree. Explicit ownership for all
| artifacts helps an organization run much more effectively.
|
| The real problem is a culture where every individual employee
| does not feel responsible for business outcomes with
| proportionate recognition by leadership of taken responsibility
| for those outcomes.
|
| Here is Admiral Rickover's take on culture:
|
| > Professionalism occurs when individuals act in the best
| interest of those being served according to objective values and
| ethical norms, even when an action is perceived to not be in the
| best interest of the individual or their organization. That is,
| there are times when professionals must sacrifice their own
| interest (or that of their organization) to meet the objective
| values and ethical norms of the profession. Professionals, in
| this sense, are serving something greater than the bureaucratic
| organization that employs them.
|
| > If Admiral Rickover had a mantra to shape a professional
| culture, it would have been, "I am personally responsible."
|
| When leadership does not practice personal responsibility or
| engages in blame, that ripples through the entire organization.
| Silos that don't function well are symptoms of leadership failing
| to take responsibility and cultural failure, not necessarily
| structural failure.
___________________________________________________________________
(page generated 2024-01-19 23:00 UTC)