[HN Gopher] Cyphernetes: A Query Language for Kubernetes
___________________________________________________________________
Cyphernetes: A Query Language for Kubernetes
Author : fatliverfreddy
Score : 130 points
Date : 2024-12-16 04:01 UTC (19 hours ago)
(HTM) web link (cyphernet.es)
(TXT) w3m dump (cyphernet.es)
| solatic wrote:
| I dunno, Kubernetes has a query language, it's called jq. As in,
| kubectl get pods -A -ojson | jq -r '.items[] | ...'. Cyphernetes
| seems simpler perhaps but it's not the 10x improvement I need to
| switch and introduce a new dependency.
| Thaxll wrote:
| You usually don't need that, since kubectl supports jsonpath.
| mdaniel wrote:
| I am firmly in the camp of jq because (a) I am able to bring
| my years of muscle memory to this problem (b) jq is _without
| a doubt_ more expressive than jsonpath (c) related to the
| muscle memory part I have uncanny valley syndrome trying to
| context switch between jsonpath and jmespath (used by awscli
| for some stupid reason) so it 's much easier to just use the
| one true json swiss-army tool upon the json those clis emit
| philsnow wrote:
| I guess they would say that you have to send the output of that
| to be inputs of another kubectl command like $
| kubectl logs -n foo $(kubectl get pod -n foo | awk
| '/Running/{print $1}')
|
| because one of their selling points is "no nested kubectl
| queries".
|
| I don't see how their queries can be more efficient than
| hitting the kube-apiserver multiple times, unless they have
| something that lives clusterside observing lifecycle events for
| all CRDs and answering queries with only one round-trip instead
| of multiple.
|
| Or maybe they're selling "no nested kubectl queries" as an
| experience feature, saying that a query language is more
| ergonomic than bash command redirection. My brain has been
| warped into the shape of the shell, for better or for worse, so
| it's not a selling point for me.
| nikau wrote:
| What does this offer over jq which I can also afford?
| weddpros wrote:
| Cyphernetes seems capable of graph/relational logic.
|
| The example on the homepage is literally "give me deployments
| with more than 2 replicas with pods that are not Running, and
| give me the IP address of the service they're serving"...
|
| Any idea how to do that with kubectl | jq? Their solution seems
| elegant to me.
| nikau wrote:
| Can just use normal jq select filters unless I'm missing
| something?
| weddpros wrote:
| the thing is you'd need 3 k8s queries, one for pods, one
| for deployments, one for services, then link all of them,
| and filter... jq helps with the filtering, kubectl can
| query, but you still need to join the 3 resources to answer
| the query...
| nikau wrote:
| Right, so doable just a bit more effort to do 3 queries
| to pipes or tmp files
| astonex wrote:
| This is Dropbox comment all over again. Lots of things
| are doable with more manual effort.
| nikau wrote:
| True - its a trade off like everything in life - do I
| want to learn yet another language syntax, or master one
| like jq.
|
| Personally I feel like mastering jq has more value across
| a lot more things.
| danpalmer wrote:
| I'm not against replacing jq/jsonpath for the right tool, they're
| not the most ergonomic. What isn't clear to me though is why this
| isn't SQL? It's so nearly SQL, and seems to support almost
| identical semantics. I realise SQL isn't perfect, but the goal of
| this project isn't (I assume) to invent a new query language, but
| to make Kubernetes more easily queryable.
| philsnow wrote:
| Reading your comment made me think that they're so close to
| "OSQuery for k8s", but that already seems to exist:
| https://www.uptycs.com/blog/kubequery-brings-the-power-of-os...
| rubenvanwyk wrote:
| It's based on Cypher, which is a query languages for graph
| databases. The author/s probably thought the data is more
| graph-like than relational.
| danpalmer wrote:
| Ah. I've not heard of Cypher before.
|
| I'd disagree and say that Kubernetes is much more relational
| that graph based, and SQL is pretty good for querying graphs
| anyway, especially with some custom extensions.
|
| This does make more sense though.
| amanj41 wrote:
| Graph DBs are generalized relationship stores. SQL can work
| for querying graphs, but graph DB DSLs like Cypher become
| very powerful when you're trying to match across multiple
| relationship hops.
|
| For example, to find all friend of a friend or friends of a
| friend of a friend: `MATCH (user:User {username:
| "amanj41"})-[:KNOWS*2..3]->(foaf) WHERE
| NOT((user)-[:KNOWS]->(foaf)) RETURN user, foaf`
| captn3m0 wrote:
| I haven't tried it , but steampipe has a k8s plugin which lets
| you use PG/sqlite:
| https://hub.steampipe.io/plugins/turbot/kubernetes/tables
| jeremya wrote:
| This is fantastic. I've always enjoyed the cypher language that
| the neo4j team created for querying graph data. The connected k8s
| api objects seem like a great place to apply that lens.
| alpb wrote:
| During many years of operating several-thousands of nodes
| production clusters on Kubernetes, I've never seen any of these
| observability tools that query kube-apiserver work at that scale.
| Even the popular tools like k9s make super expensive queries like
| listing all pods in the cluster that if you don't have enough
| load protections, can tip your Kubernetes apiserver over and
| cause an incident. If you're serious about these querying
| capabilities, I highly recommend building your own data sources
| (e.g. watch objects with a controller and dump the data in a sql
| db) and stop hitting apiserver for these things. You'll be better
| off in the long run.
| fatliverfreddy wrote:
| This is a very good point and is on the roadmap.
| ramoz wrote:
| How fun was kube-ops-view though
| rtpg wrote:
| how are Kubernetes apiservers suffering this much from this
| kind of query? Surely even in huge systems the amount of data
| that would need to be traversed is super small, right?
|
| Is this a question of Kubernetes just sticking everything into
| "standard" datastructures instead of using a database?
| mltsd wrote:
| Pretty sure the apiserver just queries the etcd database (and
| maybe caches some things, not sure) but i guess it could be
| the apiserver itself that can't handle the data :P
| tasuki wrote:
| I no longer know anything about Kubernetes, but share your
| surprise! From first principles it seems the metadata should
| be small.
| nvarsj wrote:
| My knowledge is out of date now, but the main issues IMO
| are/were:
|
| - No concept of apiserver rate limiting, by design. I see
| there is now an APF thingy, but still no basic API / edge
| rate limiting.
|
| - etcd has bad scalability. It's a very basic, highly
| consistent kv store that has tiny limits (8GB limit in latest
| docs, with a default of 2GB). It had large performance issues
| throughout its life when I was using k8s, I still don't know
| if it's much better.
| crabbone wrote:
| Long ago I wanted to re-implement at least part of kubectl in
| Python. After all, Kubernetes has documented API... what I
| quickly discovered was that kubectl commands don't map to
| Kubernetes API. Almost at all. A lot of these commands will
| require multiple queries going back and forth to accomplish
| what the command does. I quickly abandoned the project... so,
| maybe I've overlooked something, but, again, my impression
| was that instead of having generic API with queries that can
| be executed server-side to retrieve necessary information,
| Kubernetes API server offers very specialized disjoint set of
| commands that can only retrieve one small piece of
| interesting info at a time.
|
| This, obviously, isn't a scalable approach, but there's no
| "wrapper" you could write in order to mitigate the problem.
| The API itself is the problem.
| remram wrote:
| In my experience, they don't, you can just run more of them
| and you can stick them behind a load-balancer (regular HTTP
| reverse proxy). You can scale both etcd and apiserver pretty
| easily. Of course you have less control in cloud
| environments, I have less experience with that.
| alpb wrote:
| Kubernetes only lets you query resources by object type and
| that's only a prefix range scan on etcd database. There are
| no indexes whatsoever in the exhaustive LIST queries, and
| kube-apiserver handles serialization of the objects back and
| forth between multiple wire types. Over the years there has
| been a lot of optimizations, but you don't wanna list all
| pods in a 5000 node high density cluster every time you spin
| up client-side tools like this.
| atombender wrote:
| What's surprising to me is that there's no way to listen to
| _any_ object type. You have to know the "kind" beforehand,
| because the watch API requires it. To watch all objects in the
| system, you have to start a separate watch request for every
| type. This may in turn be expensive.
|
| If you have direct access to Etcd (which may not be possible in
| a managed cloud version of Kubernetes?), putting a watch on /
| might scale better.
|
| (As an aside, with the Go client API you have to jump through
| some hoops to even deserialize objects whose kinds' schemas are
| not already registered. You have to use the special
| "unstructured" deserializer. The Go SDK often has to deal with
| unknown types, e.g. for diffing, and all of the
| serializer/codec/conversion layers in the SDK seem incredibly
| overengineered for something that could have just assumed a
| simple nested map structure and then layered validation and
| parsing on top; the smell of Java programmers is pretty
| strong.)
| bluepizza wrote:
| The watch API has horrible user experience in all platforms.
| One must send a GET and keep the pipe open, waiting for a
| stream of responses. If the connection is lost, changes might
| be lost. If one misses a resource version change, then either
| the reconnection will fail, or a stale resource will be
| monitored.
|
| The Java client does this with blocking, resulting in a large
| number of threads.
|
| I truly like Kubernetes, and I think most detractors'
| complaints around complexity simply don't want to learn it.
| But the K8s API, especially the Watch API, needs some
| rigorous standards.
| f0e4c2f7 wrote:
| There is a funny parallel I see with Kubernetes that I also saw
| a lot with Linux in the early years. There are thousands of
| packages and tools you can install on Linux (think phpmyadmin
| for example) and new users sometimes go wild installing every
| single package they read about.
|
| After a while, the more mature Linux engineers start going the
| other way. Ripping out as much as possible. Stripping down to
| the leanest build they can, for performance but also to reduce
| attack surface and overall complexity.
|
| Very similar dynamic with k8s. Early days are often about
| scooping up every CNCF project like you're on a shopping spree.
| Eventually people get to shipping slim clusters running and
| 30mb containers with alpine or nix. Using it essentially as
| open source clustering for Linux.
| moshloop wrote:
| This is the approach we took while building our Internal
| Developer Platform: watches (via client-go informers with
| client-side caching) to sync data into a Postgres database as
| JSONB. Changes are tracked using JSON patches and Kubernetes
| events. To avoid a watch on every resource kind, we handle this
| by performing incremental object fetches for the objects
| involved in watched events.
|
| Getting this to perform well required several optimizations at
| both the Go and Postgres levels. On the Go side, we use
| prioritized work queues, event de-duplication, and even
| switched to Rust for efficient JSON diffs. For Postgres, we
| leverage materialized views and trigger-based optimistic
| locking
| devops99 wrote:
| The brew install cyphernetes
|
| at the top of the page is an immediate turn-off.
| riku_iki wrote:
| why?..
| TheDong wrote:
| Kubernetes only runs on linux, so it follows to reason if you
| care about k8s you should care about linux. My experience is
| also that good experienced sysadmins often use linux for
| their own machines as well.
|
| Targetting a tool at macOS users, and omitting linux
| instructions, gives the impression that the tool isn't
| targeted at sysadmins or hackers (i.e. at us), but rather at
| beginners, frontend developers, etc.
| darkwater wrote:
| I'm a "sysadmin". I only run Linux on my workstation. I
| even run NixOs on a home server. I manager Kubernetes
| clusters. Yet, I use Homebrew on Linux.
| politelemon wrote:
| Most, however, do not, nor should they be expected to.
| Homebrew is not a safe or viable package manager,
| especially when better and safer package managers exist
| in the Linux ecosystem.
| falconertc wrote:
| Saying it's targeted at beginners because it supports MacOS
| shows a lot of disconnection with what many DevOps people
| use these days. The year of the linux desktop has yet to
| arrive, and Mac is king for people in IT (at least in the
| US)
| cess11 wrote:
| I have yet to meet a competent sysadmin that cares much
| about "desktop", and to the extent they do they mostly
| seem to invent their own graphical tools, with Tcl/Tk and
| so on.
|
| Are they common where you live?
| riku_iki wrote:
| Brew runs on linux too..
| rjh29 wrote:
| Agree but I'm not sure why. I'm not a mac user so the initial
| impression is like "this isn't for you, go away". At least add
| a linux command alongside it!
| koito17 wrote:
| Homebrew has a Linux variant, but I assume almost nobody uses
| it.
|
| Personally use a Mac with Nix, and so do many of my
| coworkers. Assuming Homebrew, even for a Mac user, leaves a
| bad impression on me.
| jm2dev wrote:
| I also prefer Mac with Nix over homebrew.
| fatliverfreddy wrote:
| Thanks for the feedback. Will add more commands there on
| rotation to show the different installation options.
| marxisttemp wrote:
| Even on macOS, brew is wildly inferior to MacPorts; to be
| fair, brew is "blessed" by Swift Package Manager whereas
| MacPorts is not, but this is ironic given the guy behind
| MacPorts both worked at Apple and designed the original
| FreeBSD ports system.
| falconertc wrote:
| What? I love seeing this. I want to see how to get it quickly
| via package manager.
| tasuki wrote:
| Not everyone uses the same package manager that you use.
| allyant wrote:
| It would be good to have some example commands that can be ran
| right after installation, rather than having to figure out how
| to run the queries.
| moondev wrote:
| go run
| github.com/avitaltamir/cyphernetes/cmd/cyphernetes@v0.14.0
| --help
| multani wrote:
| I really really like Steampipe to do this kind of query:
| https://steampipe.io, which is essentially PostgreSQL (literally)
| to query many different kind of APIs, which means you have access
| to all PostgreSQL's SQL language can offer to request data.
|
| They have a Kubernetes plugin at
| https://hub.steampipe.io/plugins/turbot/kubernetes and there are
| a couple of things I really like:
|
| * it's super easy to request multiple Kubernetes clusters
| transparently: define one Steampipe "connection" for each of your
| clusters + define an "aggregator" connection that aggregates all
| of them, then query the "aggregator" connection. You will get a
| "context" column that indicates which Kubernetes cluster the row
| came from. * it's relatively fast in my experience, even for
| large result sets. It's also possible to configure a caching
| mechanism inside Steampipe to speed up your queries * it also
| understands custom resource definitions, although you need to
| help Steampipe a bit (explained here:
| https://hub.steampipe.io/plugins/turbot/kubernetes/tables/ku...)
|
| Last but not least: you can of course join multiple "plugins"
| together. I used it a couple of times to join content exposed
| only in GCP with content from Kubernetes, that was quite useful.
|
| The things I don't like so much but can be lived with:
|
| * Several columns are just exposed a plain JSON fields ; you need
| to get familiar with PostgreSQL JSON operators to get something
| useful. There's a page in Steampipe's doc to explain how to use
| them better. * Be familiar also with PostgreSQL's common table
| expressions: there are not so difficult to use but makes the SQL
| code much easier to read * It's SQL, so you have to know which
| columns you want to pick before selecting the table they come
| from ; not ideal from autocompletion * the Steampipe "psql"
| client is good, but sometimes a bit counter intuitive ; I don't
| have specific examples but I have the feeling it behaves slightly
| differently than other CLI client I used.
|
| All in all: I think Steampipe is a cool tool to know about, for
| Kubernetes but also other API systems.
| robertlagrant wrote:
| I really like Steampipe too. Writing the plugins is quite fun.
| nathanwallace wrote:
| Steampipe project lead here - thanks for the shout out &
| feedback multani!
|
| I agree with your comment about JSON columns being more
| difficult to work with at times. On balance, we've found that
| approach more robust than creating new columns (names and
| formats) that effectively become Steampipe specific.
|
| Our built-in SQL client is convenient, but it can definitely be
| better to run Steampipe in service mode and use any Postgres
| compatible SQL client you prefer [1].
|
| You might also enjoy our open source mods for compliance
| scanning [2] and visualizing clusters [3]. They are Powerpipe
| [4] dashboards as code written in HCL + SQL that query
| Steampipe.
|
| 1 - https://steampipe.io/docs/query/third-party 2 -
| https://hub.powerpipe.io/mods/turbot/kubernetes_compliance 3 -
| https://hub.powerpipe.io/mods/turbot/kubernetes_insights 4 -
| https://github.com/turbot/powerpipe
| UltraSane wrote:
| I am a big fan of Cypher I love this. I really wish actual Cypher
| supported the dot notation for nested keys.
| omrispector wrote:
| This is way cool. The ability to visualize the k8s object model
| as a graph and query it as such makes so much sense! The hottest
| feature in my mind is applying this in an operator - maintaining
| state as defined by a simple graph query. It is much more
| readable, and does so with very little code. Well Done!
| matanavital wrote:
| The one thing I have been waiting for
| atombender wrote:
| This looks great for scripting. I will say that the query
| language looks a bit too verbose for daily use -- meaning when
| you're interacting with a cluster to diagnose a problem, follow a
| job, testing the rollout of something experimental, or similar.
|
| For example, I'd love to be able to just do this as the whole
| query: metadata.name =~ "foo%"
|
| or maybe: .. =~ "foo%" // Any field matches
|
| or maybe: $pod and metadata.name =~ "foo%" //
| Shorthand to filter by type
|
| I think a query language for querying Kubernetes ought to start
| with predicate-based filtering as the foundation. Having graph
| operators seems like a nice addition, but maybe not the first
| thing people generally need?
|
| It's not quite clear who this tool is for, so maybe this is not
| the intended purpose?
| gz5 wrote:
| since cyper-based (instead of sql), is the key question whether
| my k8s data is more graph-like or relational?
|
| adjacent but lots of experts here - independent of Cyphernetes or
| specific tooling, what are you doing to secure k8s api / kubectl
| / k8s control plane?
| jeffreyaven wrote:
| our project https://github.com/stackql/stackql has a k8s provider
| which might be of interest here, we implement our own front end
| SQL parser and expose all control plane routes (and data plane
| routes in many cases) through overloaded SQL methods, this is not
| FDW based and does not require a server (postgres etc)
___________________________________________________________________
(page generated 2024-12-16 23:01 UTC)