[HN Gopher] Service mesh use cases (2020)
       ___________________________________________________________________
        
       Service mesh use cases (2020)
        
       Author : biggestlou
       Score  : 121 points
       Date   : 2023-02-09 23:38 UTC (1 days ago)
        
 (HTM) web link (lucperkins.dev)
 (TXT) w3m dump (lucperkins.dev)
        
       | gillh wrote:
       | Service meshes make it easier to roll out advanced load
       | management/reliability features such as prioritized load
       | shedding, which would otherwise need to be implemented within
       | each language/framework.
       | 
       | For instance, Aperture[0] open-source flow control system is
       | built on service meshes.
       | 
       | [0]: https://github.com/fluxninja/aperture
       | 
       | [1]: https://docs.fluxninja.com
        
       | 4pkjai wrote:
       | I really did not enjoy dealing with our service mesh at the last
       | place I worked.
        
         | jspdown wrote:
         | Out of curiosity, was it something built internally? Or were
         | you relying on a public solution?
        
       | jpdb wrote:
       | I just wrote something extremely similar, but it's only internal
       | right now.
       | 
       | I personally find that the service mesh value-prop is hard to
       | justify for a serverless stack (mostly Cloud Run, but AWS Lambda
       | too probably), and in situations where your services are mostly
       | all in the same language and you can bake the features into
       | libraries that are much easier to import.
       | 
       | Observability is a great example of this. In serverless-land,
       | you're already getting the standard HTTP metrics (ex request
       | count, response codes, latency, etc), tracing, and standard HTTP
       | request logging "for free."
        
         | davewritescode wrote:
         | > I personally find that the service mesh value-prop is hard to
         | justify for a serverless stack (mostly Cloud Run, but AWS
         | Lambda too probably), and in situations where your services are
         | mostly all in the same language and you can bake the features
         | into libraries that are much easier to import.
         | 
         | If you're running server less you already have 90% of what
         | you'd get from a service mesh.
         | 
         | I will tell you that having seen what happens in big companies,
         | baking distributed concerns into libraries always ends in
         | disaster long after you're gone.
         | 
         | When you have a piece of code deployed in 200 separate apps,
         | every change requires tons of project management.
        
       | NovemberWhiskey wrote:
       | Now imagine you have something that has the complexity and change
       | volume of a distributed control plane bringing together load-
       | balancing, service advertisement, public key infrastructure, and
       | software defined networking, and then try to imagine running it
       | at the same reliability as your DNS.
       | 
       | Also: proxies, proxies everywhere, as far as the eye can see.
        
         | darkwater wrote:
         | This is the production ready part that people will usually
         | discover later on their own skin...
        
         | zidad wrote:
         | And in addition to that, all of those immediately becoming the
         | same centralized single point of failure. What could possibly
         | go wrong (on high load)? ;p
        
           | jspdown wrote:
           | In most implementations this is not the case. Service Meshes
           | tend to either follow a sidecar or a DaemonSet approach. You
           | don't have a single proxy, people usually complain about the
           | exact opposite.
        
       | samsquire wrote:
       | Thanks for this.
       | 
       | I have never deployed a service mesh or used one but I am
       | designing something similar at the code layer. It is designed to
       | route between server components. That is, at the architecture
       | between threads in a multithreaded system.
       | 
       | The problem I want to solve is that I want architecture to be
       | trivially easy to change with minimal _code_ changes. This is the
       | promise and allure of enterprise service buses and messaging
       | queues and probably Spring.
       | 
       | I have managed RabbitMQ and I didn't enjoy it.
       | 
       | If I want a system that can scale up and down and that multiples
       | of any system object can be introduced or removed without drastic
       | rewrites.
       | 
       | I would like to decouple bottleneck from code and turn it into
       | runtime configuration.
       | 
       | My understanding of things such as Traefik and istio is that they
       | are frustrating to set up.
       | 
       | Specifically I am working on designing interthread communication
       | patterns for multithreaded software.
       | 
       | How do you design an architecture that is easy to change, scales
       | and is flexible?
       | 
       | I am thinking of a message routing definition format that is
       | extremely flexible and allows any topology to be created.
       | 
       | https://github.com/samsquire/ideas4#526-multiplexing-setting...
       | 
       | I think there is application of the same pattern to the network
       | layer too.
       | 
       | Each communication event has associated with it an environment of
       | keyvalues that look similar to this:                 petsserver1
       | container1       thread3       socket5       user563
       | ingestionthread1
       | 
       | These can be used to route to keyspace ranges (such as particular
       | users to tenant shards or load balance) to other components. For
       | example users1-1000 are handled by petsserver1 and socket5 is
       | associated with thread3.
       | 
       | In other words: changing the RabbitMQ routing settings doesn't
       | change the architecture of your software. You need to change the
       | architecture of the software to match the routing configuration.
       | But what if you changed the routing configuration and the
       | application architecture changed to match?
        
         | gnur wrote:
         | I'd say most of these patterns are supported by NATS, it can do
         | pub/sub but actually also has excellent support for RPC and in
         | the latest iteration it also has a KV store baked in. I've been
         | using it for a few pet projects so far and it has never been
         | the weakest link.
        
           | samsquire wrote:
           | I keep hearing about NATS but I am yet to use it leisurely or
           | for work.
           | 
           | Thanks for the recommendation :-)
        
         | CraigJPerry wrote:
         | >> But what if you changed the routing configuration and the
         | application architecture changed to match?
         | 
         | If there were 3 ways to categorise scaling (there's more than
         | this in reality) then they might be vertical, horizontal then
         | distributed.
         | 
         | You're describing an architecture that's in the horizontal
         | scaling world view.
         | 
         | You're not in vertical because you're using higher powered (but
         | slower) strategies like active routing for comms between
         | components where in vertical you'd have configurable queues but
         | no routing layer.
         | 
         | You're not in distributed scaling mode because your routing is
         | assuming consistent latency and consistent bandwidth
         | behaviours.
         | 
         | I don't think one architecture to rule them all is a solvable
         | problem. I'd heartily and very gratefully welcome being proven
         | wrong on this.
        
           | samsquire wrote:
           | Thanks for your comment. It's definitely food for thought.
           | 
           | You remind me of fallacies of distributed computing by
           | mentioning consistent latency and bandwidth.
           | 
           | https://en.m.wikipedia.org/wiki/Fallacies_of_distributed_com.
           | ..
           | 
           | I'm still at the design stage.
           | 
           | Those architectures you describe, I am hoping there is a
           | representation that can describe many architectures. There's
           | probably architecture's I am yet to think of that are
           | unrepresentable with my format.
           | 
           | Going from 1 to N or removing, adding a layer should be
           | automatable. That's my hope anyway.
           | 
           | I want everything to wire itself automatically.
           | 
           | I am trying to come up with a data structure that can
           | represent architecture.
           | 
           | I am trying to do what inversion of control containers do per
           | request but for architecture. In Inversion of control
           | containers you specify a scope of an object that is
           | instantiated for a scope such as for a request or for a
           | session. I want that for architecture.
        
             | CraigJPerry wrote:
             | it's such a fundamental problem space and with such a rich
             | diversity of possible solutions that at a minimum you're
             | going to create something seriously useful for a subset of
             | types of application. But it'd be transformational for
             | computing if you cracked the whole problem. I hope you do.
             | 
             | I do like your idea of outsourcing the wiring (an error
             | prone, detail heavy task) away from humans.
        
       | Scubabear68 wrote:
       | I've only read about Service Mesh, my impression was that it
       | seems to add an awful lot of processes and complexity just to
       | make developer's lives slightly easier.
       | 
       | Maybe I'm wrong but it almost feels like busy work for DevOps. Is
       | my first impression wrong? Is this the right way to architect
       | systems in some use cases, and if so what are they?
        
         | MoOmer wrote:
         | Many of the use cases described in the post are solved by
         | service meshes.
         | 
         | So, in my opinion, the questions are introspective:
         | 
         | - "Do I have enough context to know what problem those
         | solutions are solving, and to at least appreciate the problem
         | space to understand why someone may solve it like this?"
         | 
         | - "Do I have or perceive those problem to impact my
         | infrastructure/applications?"
         | 
         | - "Does the solution offered by the use cases described appeal
         | to me?"
         | 
         | If yes at the end, then one potential implementation is a
         | service mesh.
         | 
         | A lot of these are solved out-of-the-box with Hashicorp's
         | Nomad/Consul/Vault pairing, for example!
        
           | remram wrote:
           | It is true that a lot of those use cases are covered by
           | "basic" Kubernetes (or Nomad) without the addition of Istio
           | or similar, e.g. service discovery, load-balancing, circuit-
           | breaking, autoscaling, blue-green, isolation, health
           | checking...
           | 
           | Adding a service mesh onto Kubernetes seems to bring a lot of
           | complexity for a few benefits (80% of the effort for the last
           | 20% sort of deal).
        
             | campbel wrote:
             | > Adding a service mesh onto Kubernetes seems to bring a
             | lot of complexity for a few benefits
             | 
             | I think the benefits are magnified in larger organizations
             | or where operators and devs are not the same people. And
             | the complexity is relative to which solution you pick. If
             | you're already on Kubernetes, linkerd2 is relatively easy
             | to install and manage; is that worth it? To me it has been
             | in the past.
        
         | tyingq wrote:
         | I suspect if a Service Mesh is ultimately shown to have broad
         | value, one will make it's way into the K8S core.
         | 
         | To me, it's a fairly big decision to layer something that's
         | complex in it's own right on top of something else that's also
         | complex.
        
           | jpdb wrote:
           | > I suspect if a Service Mesh is ultimately shown to have
           | broad value, one will make it's way into the K8S core
           | 
           | I'm not so sure. I suspect it'll follow the same roadmap as
           | Gateway API, which it already kind of is with the Service
           | Mesh Interface (https://smi-spec.io/)
        
             | jspdown wrote:
             | Indeed, all major Service Meshes solution for Kubernetes
             | implements (at least some part) the SMI specification.
             | There is a group composed of these players working actively
             | on making such spec a standard.
             | 
             | Understanding these few CRDs give great insights on what do
             | expect from a Service mesh and how thinks are typically
             | articulated.
        
         | kevan wrote:
         | >slightly easier
         | 
         | As a company grows sooner or later most of these features
         | become pretty desirable from an operations perspective. Feature
         | developers likely don't and shouldn't need to care. It probably
         | starts with things like Auth and basic load balancing. As the
         | company grows to dozens of teams and services then you'll start
         | feeling pain around service discovery and wish you didn't need
         | to implement yet another custom auth scheme to integrate with
         | another department's service.
         | 
         | After a few retry storm outages people will start paying more
         | attention to load shedding, autoscaling, circuit breakers, rate
         | limiting.
         | 
         | More mature companies or ones with compliance obligations start
         | thinking about zero-trust, TLS everywhere, auditing, and
         | centralized telemetry.
         | 
         | Is there complexity? Absolutely. Is it worth it? That depends
         | where your company is in its lifecycle. Sometimes yes, other
         | times you're probably better off just building things and
         | living with the fact that your load shedding strategy is "just
         | tip over".
        
           | davewritescode wrote:
           | We're the process of moving all of our services over to a
           | service mesh and while the growing pains are definitely
           | there, the payoff is huge.
           | 
           | Even aside from a lot of the more hyped up features of
           | service mesh, the biggest thing Istio solves is tls
           | everywhere and cloud agnostic workload identity. All of our
           | pods get new tls certs every 24 hours and nobody needs an API
           | key to call anything.
           | 
           | Our security team is thrilled that applications running with
           | an Istio sidecar literally have way to leak credentials.
           | There's no API keys to accidentally log. Once we have
           | databases setup to support mTLS authentication, we won't need
           | database passwords anymore.
        
           | bushbaba wrote:
           | Some of the functionality you mentioned above is possible
           | without a service mesh.
        
         | dasil003 wrote:
         | It's 100% a question of scale. And I don't mean throughput, I
         | mean domain and business logic complexity that requires an army
         | of engineers.
         | 
         | Just as it's foolish to create dozens of services if you have a
         | 10-person team, you don't really get much out of a service mesh
         | if you only have a handful of services and not feeling the pain
         | with your traditional tooling.
         | 
         | But once you get to large scale with convoluted business logic
         | that is hard to reason about because so many teams are
         | involved, the search for scalable abstractions begin. Service
         | mesh then becomes useful because it is completely orthogonal to
         | biz logic and you can now add engineers 100% focused on tooling
         | and operations, and product engineers can think a lot less
         | about certain classes of reliability and security concerns.
         | 
         | Of course in todays era of resume driven development, and the
         | huge comp paid by FAANGs, you are going to get a ton of young
         | devs pushing for service mesh way before it makes sense. I
         | can't say I blame them, but keep your wits about you!
        
           | peteradio wrote:
           | If you can convince your business folks to run shit on the
           | command-line then there is basically no need for services
           | ever. I know it sounds insane but its how it was done in the
           | old days and there really is only a false barrier to doing it
           | again.
        
             | emptysea wrote:
             | Place I worked had support staff copy-pasting mongo queries
             | from google docs -- worked in the early days but eventually
             | you have to start building an admin interface for more
             | complicated processes
             | 
             | When it was just mongo installs are easy since they only
             | needed a mongo desktop client
        
               | peteradio wrote:
               | Terminal can handle auth.
        
         | jrockway wrote:
         | It's a "big company" thing. In my opinion, the best way to add
         | mTLS to your stack is to just adjust your application code to
         | verify the certificate on the other end of the connection. But
         | if the "dev team" has the mandate "add features X, Y, and Z",
         | and the "devops team" has the mandate "implement mTLS by the
         | end of Q1", you can see why "bolt on a bunch of sidecars"
         | becomes the selected solution. The two teams don't have to talk
         | with each other, but they both accomplish their goals. The cost
         | is less understanding, debuggability, and the cost of the
         | service mesh product. But, from both teams' perspective, it
         | looks like the best option.
         | 
         | I'm not a big fan of this approach; the two teams need to have
         | a meeting and need to have a shared goal to implement the
         | business's selected security requirements together. But
         | sometimes fixing the org is too hard, so there is a Plan B.
        
           | davewritescode wrote:
           | I very much disagree the sentiment that adding mTLS is just
           | "verifying the certificate on the other end of the
           | connection". You ignore the process of distribution and
           | rotation of certificates which is non-trivial to implement
           | application side.
        
         | jagged-chisel wrote:
         | Most of my programming peers want to focus on solving product-
         | related problems rather than authe, authn, tls config,
         | failover, throttling, discovery...
         | 
         | We want to automate everything not related to the code we want
         | to write. Service meshes sound like a good way to do that.
        
           | [deleted]
        
           | Scubabear68 wrote:
           | Right - by why not use something like an API gateway then?
        
             | pbalau wrote:
             | That can work, but it means you simply outsourced the
             | problem to AWS. It's not a bad idea per se, but it means
             | your service needs to talk, in some way, http.
             | 
             | You could use the service mesh thing from AWS, along with
             | cognito jwts, for authenticatetion and authorization
        
             | steviesands wrote:
             | API gateways are primarily used for HTTP traffic coming
             | from clients external to your backend services eg. an iOS
             | device (hence the term 'gateway' vs. 'mesh'). I don't think
             | they support thrift or grpc (at least aws doesn't, not sure
             | about other providers). https://aws.amazon.com/api-gateway/
        
       | asim wrote:
       | Article is from 2020. Please add to title.
        
         | dang wrote:
         | Added. Thanks!
        
       | jcq3 wrote:
       | I thought service mesh main use case was to reduce time to
       | production delivery, allowing hotfixes to be much more reactive.
       | Am I totally wrong?
        
         | jcq3 wrote:
         | I think it is named Blue-green deployments in the article
        
         | jspdown wrote:
         | It's not about fast delivery, at least not in this way.
         | Arguably if you need mTLS, traffic shaping, cross service
         | observability, service discovery... Then yes, it's much faster
         | to use an existing solution than built it yourself. But it
         | won't make you hot fixes shipped faster.
         | 
         | Service Mesh is nothing new. People tend to call it differently
         | back then. The key features it bring are: - Traffic shaping:
         | mirroring, canary, blue-green... - Cross service observability
         | - End to end encryption - Service discovery
        
       ___________________________________________________________________
       (page generated 2023-02-11 23:01 UTC)