[HN Gopher] Cyphernetes: A Query Language for Kubernetes
       ___________________________________________________________________
        
       Cyphernetes: A Query Language for Kubernetes
        
       Author : fatliverfreddy
       Score  : 130 points
       Date   : 2024-12-16 04:01 UTC (19 hours ago)
        
 (HTM) web link (cyphernet.es)
 (TXT) w3m dump (cyphernet.es)
        
       | solatic wrote:
       | I dunno, Kubernetes has a query language, it's called jq. As in,
       | kubectl get pods -A -ojson | jq -r '.items[] | ...'. Cyphernetes
       | seems simpler perhaps but it's not the 10x improvement I need to
       | switch and introduce a new dependency.
        
         | Thaxll wrote:
         | You usually don't need that, since kubectl supports jsonpath.
        
           | mdaniel wrote:
           | I am firmly in the camp of jq because (a) I am able to bring
           | my years of muscle memory to this problem (b) jq is _without
           | a doubt_ more expressive than jsonpath (c) related to the
           | muscle memory part I have uncanny valley syndrome trying to
           | context switch between jsonpath and jmespath (used by awscli
           | for some stupid reason) so it 's much easier to just use the
           | one true json swiss-army tool upon the json those clis emit
        
         | philsnow wrote:
         | I guess they would say that you have to send the output of that
         | to be inputs of another kubectl command like                 $
         | kubectl logs -n foo $(kubectl get pod -n foo | awk
         | '/Running/{print $1}')
         | 
         | because one of their selling points is "no nested kubectl
         | queries".
         | 
         | I don't see how their queries can be more efficient than
         | hitting the kube-apiserver multiple times, unless they have
         | something that lives clusterside observing lifecycle events for
         | all CRDs and answering queries with only one round-trip instead
         | of multiple.
         | 
         | Or maybe they're selling "no nested kubectl queries" as an
         | experience feature, saying that a query language is more
         | ergonomic than bash command redirection. My brain has been
         | warped into the shape of the shell, for better or for worse, so
         | it's not a selling point for me.
        
       | nikau wrote:
       | What does this offer over jq which I can also afford?
        
         | weddpros wrote:
         | Cyphernetes seems capable of graph/relational logic.
         | 
         | The example on the homepage is literally "give me deployments
         | with more than 2 replicas with pods that are not Running, and
         | give me the IP address of the service they're serving"...
         | 
         | Any idea how to do that with kubectl | jq? Their solution seems
         | elegant to me.
        
           | nikau wrote:
           | Can just use normal jq select filters unless I'm missing
           | something?
        
             | weddpros wrote:
             | the thing is you'd need 3 k8s queries, one for pods, one
             | for deployments, one for services, then link all of them,
             | and filter... jq helps with the filtering, kubectl can
             | query, but you still need to join the 3 resources to answer
             | the query...
        
               | nikau wrote:
               | Right, so doable just a bit more effort to do 3 queries
               | to pipes or tmp files
        
               | astonex wrote:
               | This is Dropbox comment all over again. Lots of things
               | are doable with more manual effort.
        
               | nikau wrote:
               | True - its a trade off like everything in life - do I
               | want to learn yet another language syntax, or master one
               | like jq.
               | 
               | Personally I feel like mastering jq has more value across
               | a lot more things.
        
       | danpalmer wrote:
       | I'm not against replacing jq/jsonpath for the right tool, they're
       | not the most ergonomic. What isn't clear to me though is why this
       | isn't SQL? It's so nearly SQL, and seems to support almost
       | identical semantics. I realise SQL isn't perfect, but the goal of
       | this project isn't (I assume) to invent a new query language, but
       | to make Kubernetes more easily queryable.
        
         | philsnow wrote:
         | Reading your comment made me think that they're so close to
         | "OSQuery for k8s", but that already seems to exist:
         | https://www.uptycs.com/blog/kubequery-brings-the-power-of-os...
        
         | rubenvanwyk wrote:
         | It's based on Cypher, which is a query languages for graph
         | databases. The author/s probably thought the data is more
         | graph-like than relational.
        
           | danpalmer wrote:
           | Ah. I've not heard of Cypher before.
           | 
           | I'd disagree and say that Kubernetes is much more relational
           | that graph based, and SQL is pretty good for querying graphs
           | anyway, especially with some custom extensions.
           | 
           | This does make more sense though.
        
             | amanj41 wrote:
             | Graph DBs are generalized relationship stores. SQL can work
             | for querying graphs, but graph DB DSLs like Cypher become
             | very powerful when you're trying to match across multiple
             | relationship hops.
             | 
             | For example, to find all friend of a friend or friends of a
             | friend of a friend: `MATCH (user:User {username:
             | "amanj41"})-[:KNOWS*2..3]->(foaf) WHERE
             | NOT((user)-[:KNOWS]->(foaf)) RETURN user, foaf`
        
         | captn3m0 wrote:
         | I haven't tried it , but steampipe has a k8s plugin which lets
         | you use PG/sqlite:
         | https://hub.steampipe.io/plugins/turbot/kubernetes/tables
        
       | jeremya wrote:
       | This is fantastic. I've always enjoyed the cypher language that
       | the neo4j team created for querying graph data. The connected k8s
       | api objects seem like a great place to apply that lens.
        
       | alpb wrote:
       | During many years of operating several-thousands of nodes
       | production clusters on Kubernetes, I've never seen any of these
       | observability tools that query kube-apiserver work at that scale.
       | Even the popular tools like k9s make super expensive queries like
       | listing all pods in the cluster that if you don't have enough
       | load protections, can tip your Kubernetes apiserver over and
       | cause an incident. If you're serious about these querying
       | capabilities, I highly recommend building your own data sources
       | (e.g. watch objects with a controller and dump the data in a sql
       | db) and stop hitting apiserver for these things. You'll be better
       | off in the long run.
        
         | fatliverfreddy wrote:
         | This is a very good point and is on the roadmap.
        
         | ramoz wrote:
         | How fun was kube-ops-view though
        
         | rtpg wrote:
         | how are Kubernetes apiservers suffering this much from this
         | kind of query? Surely even in huge systems the amount of data
         | that would need to be traversed is super small, right?
         | 
         | Is this a question of Kubernetes just sticking everything into
         | "standard" datastructures instead of using a database?
        
           | mltsd wrote:
           | Pretty sure the apiserver just queries the etcd database (and
           | maybe caches some things, not sure) but i guess it could be
           | the apiserver itself that can't handle the data :P
        
           | tasuki wrote:
           | I no longer know anything about Kubernetes, but share your
           | surprise! From first principles it seems the metadata should
           | be small.
        
           | nvarsj wrote:
           | My knowledge is out of date now, but the main issues IMO
           | are/were:
           | 
           | - No concept of apiserver rate limiting, by design. I see
           | there is now an APF thingy, but still no basic API / edge
           | rate limiting.
           | 
           | - etcd has bad scalability. It's a very basic, highly
           | consistent kv store that has tiny limits (8GB limit in latest
           | docs, with a default of 2GB). It had large performance issues
           | throughout its life when I was using k8s, I still don't know
           | if it's much better.
        
           | crabbone wrote:
           | Long ago I wanted to re-implement at least part of kubectl in
           | Python. After all, Kubernetes has documented API... what I
           | quickly discovered was that kubectl commands don't map to
           | Kubernetes API. Almost at all. A lot of these commands will
           | require multiple queries going back and forth to accomplish
           | what the command does. I quickly abandoned the project... so,
           | maybe I've overlooked something, but, again, my impression
           | was that instead of having generic API with queries that can
           | be executed server-side to retrieve necessary information,
           | Kubernetes API server offers very specialized disjoint set of
           | commands that can only retrieve one small piece of
           | interesting info at a time.
           | 
           | This, obviously, isn't a scalable approach, but there's no
           | "wrapper" you could write in order to mitigate the problem.
           | The API itself is the problem.
        
           | remram wrote:
           | In my experience, they don't, you can just run more of them
           | and you can stick them behind a load-balancer (regular HTTP
           | reverse proxy). You can scale both etcd and apiserver pretty
           | easily. Of course you have less control in cloud
           | environments, I have less experience with that.
        
           | alpb wrote:
           | Kubernetes only lets you query resources by object type and
           | that's only a prefix range scan on etcd database. There are
           | no indexes whatsoever in the exhaustive LIST queries, and
           | kube-apiserver handles serialization of the objects back and
           | forth between multiple wire types. Over the years there has
           | been a lot of optimizations, but you don't wanna list all
           | pods in a 5000 node high density cluster every time you spin
           | up client-side tools like this.
        
         | atombender wrote:
         | What's surprising to me is that there's no way to listen to
         | _any_ object type. You have to know the  "kind" beforehand,
         | because the watch API requires it. To watch all objects in the
         | system, you have to start a separate watch request for every
         | type. This may in turn be expensive.
         | 
         | If you have direct access to Etcd (which may not be possible in
         | a managed cloud version of Kubernetes?), putting a watch on /
         | might scale better.
         | 
         | (As an aside, with the Go client API you have to jump through
         | some hoops to even deserialize objects whose kinds' schemas are
         | not already registered. You have to use the special
         | "unstructured" deserializer. The Go SDK often has to deal with
         | unknown types, e.g. for diffing, and all of the
         | serializer/codec/conversion layers in the SDK seem incredibly
         | overengineered for something that could have just assumed a
         | simple nested map structure and then layered validation and
         | parsing on top; the smell of Java programmers is pretty
         | strong.)
        
           | bluepizza wrote:
           | The watch API has horrible user experience in all platforms.
           | One must send a GET and keep the pipe open, waiting for a
           | stream of responses. If the connection is lost, changes might
           | be lost. If one misses a resource version change, then either
           | the reconnection will fail, or a stale resource will be
           | monitored.
           | 
           | The Java client does this with blocking, resulting in a large
           | number of threads.
           | 
           | I truly like Kubernetes, and I think most detractors'
           | complaints around complexity simply don't want to learn it.
           | But the K8s API, especially the Watch API, needs some
           | rigorous standards.
        
         | f0e4c2f7 wrote:
         | There is a funny parallel I see with Kubernetes that I also saw
         | a lot with Linux in the early years. There are thousands of
         | packages and tools you can install on Linux (think phpmyadmin
         | for example) and new users sometimes go wild installing every
         | single package they read about.
         | 
         | After a while, the more mature Linux engineers start going the
         | other way. Ripping out as much as possible. Stripping down to
         | the leanest build they can, for performance but also to reduce
         | attack surface and overall complexity.
         | 
         | Very similar dynamic with k8s. Early days are often about
         | scooping up every CNCF project like you're on a shopping spree.
         | Eventually people get to shipping slim clusters running and
         | 30mb containers with alpine or nix. Using it essentially as
         | open source clustering for Linux.
        
         | moshloop wrote:
         | This is the approach we took while building our Internal
         | Developer Platform: watches (via client-go informers with
         | client-side caching) to sync data into a Postgres database as
         | JSONB. Changes are tracked using JSON patches and Kubernetes
         | events. To avoid a watch on every resource kind, we handle this
         | by performing incremental object fetches for the objects
         | involved in watched events.
         | 
         | Getting this to perform well required several optimizations at
         | both the Go and Postgres levels. On the Go side, we use
         | prioritized work queues, event de-duplication, and even
         | switched to Rust for efficient JSON diffs. For Postgres, we
         | leverage materialized views and trigger-based optimistic
         | locking
        
       | devops99 wrote:
       | The                 brew install cyphernetes
       | 
       | at the top of the page is an immediate turn-off.
        
         | riku_iki wrote:
         | why?..
        
           | TheDong wrote:
           | Kubernetes only runs on linux, so it follows to reason if you
           | care about k8s you should care about linux. My experience is
           | also that good experienced sysadmins often use linux for
           | their own machines as well.
           | 
           | Targetting a tool at macOS users, and omitting linux
           | instructions, gives the impression that the tool isn't
           | targeted at sysadmins or hackers (i.e. at us), but rather at
           | beginners, frontend developers, etc.
        
             | darkwater wrote:
             | I'm a "sysadmin". I only run Linux on my workstation. I
             | even run NixOs on a home server. I manager Kubernetes
             | clusters. Yet, I use Homebrew on Linux.
        
               | politelemon wrote:
               | Most, however, do not, nor should they be expected to.
               | Homebrew is not a safe or viable package manager,
               | especially when better and safer package managers exist
               | in the Linux ecosystem.
        
             | falconertc wrote:
             | Saying it's targeted at beginners because it supports MacOS
             | shows a lot of disconnection with what many DevOps people
             | use these days. The year of the linux desktop has yet to
             | arrive, and Mac is king for people in IT (at least in the
             | US)
        
               | cess11 wrote:
               | I have yet to meet a competent sysadmin that cares much
               | about "desktop", and to the extent they do they mostly
               | seem to invent their own graphical tools, with Tcl/Tk and
               | so on.
               | 
               | Are they common where you live?
        
             | riku_iki wrote:
             | Brew runs on linux too..
        
         | rjh29 wrote:
         | Agree but I'm not sure why. I'm not a mac user so the initial
         | impression is like "this isn't for you, go away". At least add
         | a linux command alongside it!
        
           | koito17 wrote:
           | Homebrew has a Linux variant, but I assume almost nobody uses
           | it.
           | 
           | Personally use a Mac with Nix, and so do many of my
           | coworkers. Assuming Homebrew, even for a Mac user, leaves a
           | bad impression on me.
        
             | jm2dev wrote:
             | I also prefer Mac with Nix over homebrew.
        
           | fatliverfreddy wrote:
           | Thanks for the feedback. Will add more commands there on
           | rotation to show the different installation options.
        
           | marxisttemp wrote:
           | Even on macOS, brew is wildly inferior to MacPorts; to be
           | fair, brew is "blessed" by Swift Package Manager whereas
           | MacPorts is not, but this is ironic given the guy behind
           | MacPorts both worked at Apple and designed the original
           | FreeBSD ports system.
        
         | falconertc wrote:
         | What? I love seeing this. I want to see how to get it quickly
         | via package manager.
        
           | tasuki wrote:
           | Not everyone uses the same package manager that you use.
        
         | allyant wrote:
         | It would be good to have some example commands that can be ran
         | right after installation, rather than having to figure out how
         | to run the queries.
        
         | moondev wrote:
         | go run
         | github.com/avitaltamir/cyphernetes/cmd/cyphernetes@v0.14.0
         | --help
        
       | multani wrote:
       | I really really like Steampipe to do this kind of query:
       | https://steampipe.io, which is essentially PostgreSQL (literally)
       | to query many different kind of APIs, which means you have access
       | to all PostgreSQL's SQL language can offer to request data.
       | 
       | They have a Kubernetes plugin at
       | https://hub.steampipe.io/plugins/turbot/kubernetes and there are
       | a couple of things I really like:
       | 
       | * it's super easy to request multiple Kubernetes clusters
       | transparently: define one Steampipe "connection" for each of your
       | clusters + define an "aggregator" connection that aggregates all
       | of them, then query the "aggregator" connection. You will get a
       | "context" column that indicates which Kubernetes cluster the row
       | came from. * it's relatively fast in my experience, even for
       | large result sets. It's also possible to configure a caching
       | mechanism inside Steampipe to speed up your queries * it also
       | understands custom resource definitions, although you need to
       | help Steampipe a bit (explained here:
       | https://hub.steampipe.io/plugins/turbot/kubernetes/tables/ku...)
       | 
       | Last but not least: you can of course join multiple "plugins"
       | together. I used it a couple of times to join content exposed
       | only in GCP with content from Kubernetes, that was quite useful.
       | 
       | The things I don't like so much but can be lived with:
       | 
       | * Several columns are just exposed a plain JSON fields ; you need
       | to get familiar with PostgreSQL JSON operators to get something
       | useful. There's a page in Steampipe's doc to explain how to use
       | them better. * Be familiar also with PostgreSQL's common table
       | expressions: there are not so difficult to use but makes the SQL
       | code much easier to read * It's SQL, so you have to know which
       | columns you want to pick before selecting the table they come
       | from ; not ideal from autocompletion * the Steampipe "psql"
       | client is good, but sometimes a bit counter intuitive ; I don't
       | have specific examples but I have the feeling it behaves slightly
       | differently than other CLI client I used.
       | 
       | All in all: I think Steampipe is a cool tool to know about, for
       | Kubernetes but also other API systems.
        
         | robertlagrant wrote:
         | I really like Steampipe too. Writing the plugins is quite fun.
        
         | nathanwallace wrote:
         | Steampipe project lead here - thanks for the shout out &
         | feedback multani!
         | 
         | I agree with your comment about JSON columns being more
         | difficult to work with at times. On balance, we've found that
         | approach more robust than creating new columns (names and
         | formats) that effectively become Steampipe specific.
         | 
         | Our built-in SQL client is convenient, but it can definitely be
         | better to run Steampipe in service mode and use any Postgres
         | compatible SQL client you prefer [1].
         | 
         | You might also enjoy our open source mods for compliance
         | scanning [2] and visualizing clusters [3]. They are Powerpipe
         | [4] dashboards as code written in HCL + SQL that query
         | Steampipe.
         | 
         | 1 - https://steampipe.io/docs/query/third-party 2 -
         | https://hub.powerpipe.io/mods/turbot/kubernetes_compliance 3 -
         | https://hub.powerpipe.io/mods/turbot/kubernetes_insights 4 -
         | https://github.com/turbot/powerpipe
        
       | UltraSane wrote:
       | I am a big fan of Cypher I love this. I really wish actual Cypher
       | supported the dot notation for nested keys.
        
       | omrispector wrote:
       | This is way cool. The ability to visualize the k8s object model
       | as a graph and query it as such makes so much sense! The hottest
       | feature in my mind is applying this in an operator - maintaining
       | state as defined by a simple graph query. It is much more
       | readable, and does so with very little code. Well Done!
        
       | matanavital wrote:
       | The one thing I have been waiting for
        
       | atombender wrote:
       | This looks great for scripting. I will say that the query
       | language looks a bit too verbose for daily use -- meaning when
       | you're interacting with a cluster to diagnose a problem, follow a
       | job, testing the rollout of something experimental, or similar.
       | 
       | For example, I'd love to be able to just do this as the whole
       | query:                   metadata.name =~ "foo%"
       | 
       | or maybe:                   .. =~ "foo%"  // Any field matches
       | 
       | or maybe:                   $pod and metadata.name =~ "foo%"  //
       | Shorthand to filter by type
       | 
       | I think a query language for querying Kubernetes ought to start
       | with predicate-based filtering as the foundation. Having graph
       | operators seems like a nice addition, but maybe not the first
       | thing people generally need?
       | 
       | It's not quite clear who this tool is for, so maybe this is not
       | the intended purpose?
        
       | gz5 wrote:
       | since cyper-based (instead of sql), is the key question whether
       | my k8s data is more graph-like or relational?
       | 
       | adjacent but lots of experts here - independent of Cyphernetes or
       | specific tooling, what are you doing to secure k8s api / kubectl
       | / k8s control plane?
        
       | jeffreyaven wrote:
       | our project https://github.com/stackql/stackql has a k8s provider
       | which might be of interest here, we implement our own front end
       | SQL parser and expose all control plane routes (and data plane
       | routes in many cases) through overloaded SQL methods, this is not
       | FDW based and does not require a server (postgres etc)
        
       ___________________________________________________________________
       (page generated 2024-12-16 23:01 UTC)