[HN Gopher] CERN swaps out databases to feed its petabyte-a-day ...
___________________________________________________________________
CERN swaps out databases to feed its petabyte-a-day habit
Author : valyala
Score : 75 points
Date : 2023-09-20 06:46 UTC (1 days ago)
(HTM) web link (www.theregister.com)
(TXT) w3m dump (www.theregister.com)
| bouvin wrote:
| One of my fondest memories as a summer student at CERN in 1993
| (in the Electronics and Computing for Physics department) was the
| visit to the basement beneath the main computing facility, where
| a colossal tape robot was in operation. Even at that time, CERN
| was grappling with exceedingly vast amounts of data.
| foota wrote:
| Weird that the title talks about the petabyte a day, while the
| article is actually about their monitoring tooling, not the thing
| ingesting the data from experiments, iiuc.
| m3kw9 wrote:
| Over 24hr period its more then 11 Gigabytes/second or rounding to
| 100 gbps. Those shards must be pretty crazy
| formerly_proven wrote:
| The headline is about the data processed on their compute, the
| amount of data in the monitoring system is considerably smaller
| (but still not small data):
|
| > But Brij Kishor Jashal, a scientist in the CMS collaboration,
| told The Register that his team were currently aggregating 30
| terabytes over a 30-day period to monitor their computing
| infrastructure performance.
|
| So 1 TB / day, that's about 10 MB/s.
| sgt101 wrote:
| I can do this on my laptop
|
| /tumbleweed...
| qwertox wrote:
| At the end of the article it says
|
| " _InfluxDB said in March this year it had solved the cardinality
| issue with a new IOx storage engine._ "
|
| Does this mean that in the end it wasn't really necessary to
| switch to VictoriaMetrics' offering?
| esafak wrote:
| tl,dr:
|
| Speaking to The Register, Roman Khavronenko, co-founder of
| VictoriaMetrics, said the previous system had experienced
| problems with high cardinality, which refers to the level of
| repeated values - and high churn data - where applications can be
| redeployed multiple times over new instances.
|
| Implementing VictoriaMetrics as backend storage for Prometheus,
| the CMS monitoring team progressed to using the solution as
| front-end storage to replace InfluxDB and Prometheus, helping
| remove cardinality issues, the company said in a statement.
| amelius wrote:
| This is nothing compared to what dragnet surveillance has to deal
| with.
| local_crmdgeon wrote:
| And that's all on MSSQL or RDS, right?
| ilyt wrote:
| I really like VictoriaMetrics's architecture
|
| vmagent takes care of all the pesky edge things like emulating
| prometheus config parsing and various scraping bits. It also does
| buffering in case you lose network connection for a while, and
| accept vast spread of different protocols
|
| vminsert/vmselect scale separately from eachother and your
| queries don't bother your ingest all that much.
|
| vmstorage does just that, storage. Only thing that bothers me
| (compared to say, Elasticsearch), is that data can't migrate
| between nodes so you can't "just" start a new one and drain an
| old one, but a tiny bit ops work in rare cases is IMO price worth
| paying for straightforwardness of the stack..
|
| PromQL compatibility is also great, tools like Grafana "just
| work" without anyone having to write support for it.
|
| We started migrating from InfluxDB at work, and on my private
| stuff I already did. Soo much less memory usage too.
| theossuary wrote:
| What version of Influx were you running? I'm interested if v3
| will be more competitive than v2.
| ilyt wrote:
| 1.8, migration path to 2.0 was a no-no. Don't remember exact
| reasons back then but we decided to have wait-and-see
| approach and see how alternatives grow up as our data
| generally grows in predictable rate
|
| Also frankly Prometheus support is a massive positive. For
| better or worse industry standarized on apps using Prometheus
| as ingest for metrics, and also most of the materials related
| to that will of course give examples in PromQL
|
| Flux is frankly hieroglyphs for people using it 20 minutes a
| month like our developers
|
| This is given example on how to raise value in Flux to power
| of two |> map(fn: (r) => ({ r with _value:
| r._value * r._value }))
|
| This is example of that in prometheus value
| ^ 2
|
| This is example of calculating percentage in Flux (from their
| webpage) data |>
| pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn:
| "_value") |> map( fn: (r) => ({
| _time: r._time, _field: "used_percent",
| _value: float(v: r.used) / float(v: r.total) * 100.0,
| }), )
|
| This is how you do it in PromQL space_used
| / space_total * 100
|
| Flux is atrocious for "normal users".
| iFire wrote:
| OPENSOURCE, APACHE2 LICENSE
|
| https://github.com/VictoriaMetrics/VictoriaMetrics/blob/mast...
| [deleted]
| [deleted]
| Havoc wrote:
| That's one hell of an endorsement. Marketing team won the
| jackpot.
| keep_reading wrote:
| I also dropped InfluxDB at work due to its terrible performance.
| VictoriaMetrics is great
|
| I was using Promscale (TimescaleDB) but they EOL'd Promscale
| which forced us to Victoria. But either way both of these are
| much faster than Influx
|
| Don't get fooled into the latest InfluxDB rewrite. I think the
| latest is cloud hosted only too? So stupid
| contravariant wrote:
| Honestly the database isn't half as useful as the tool they
| wrote to grab the metrics. At least I think telegraf was
| written by the same people? It seems to have the exact opposite
| design philosophy.
| pphysch wrote:
| I saw the writing on the wall with InfluxDB v2 (doubling down
| on closed platform / SaaS) and advocated exploring
| VictoriaMetrics, even though we had some Influx v1 running. No
| regrets.
|
| I also prefer the golang-esque simplicity of the Prometheus
| ecosystem. Monitoring is the last place I want unnecessary
| abstraction layers and complicated configuration files.
| ComputerGuru wrote:
| Missing from the title: leaving InfluxDB and Prometheus for
| VictoriaMetrics.
| hintymad wrote:
| This is puzzling. I'm not sure how VictoriaMetrics solved the
| cardinality problem? When running an aggregate query that sums
| up some counters for a single metric over the dimension of
| instances in a time window of larger than a few hours,
| VictoriaMetrics would barf with error for the querying having
| too many time series (or data points? I forgot the exact
| wording). This clearly shows that 1/ Victoria Metrics does not
| treat a time series with multiple dimensions as a single time
| series; 2/ VictoriaMetrics does not perform hierarchical
| aggregation.
|
| That is, VictoriaMetrics has not really built a true time
| series DB that handles reasonable cardinalities.
| [deleted]
___________________________________________________________________
(page generated 2023-09-21 23:01 UTC)