[HN Gopher] Velox: Meta's Unified Execution Engine [pdf]
___________________________________________________________________
Velox: Meta's Unified Execution Engine [pdf]
Author : luu
Score : 59 points
Date : 2024-03-24 03:45 UTC (1 days ago)
(HTM) web link (www.eecs.umich.edu)
(TXT) w3m dump (www.eecs.umich.edu)
| pvg wrote:
| A thread from late 2022:
| https://news.ycombinator.com/item?id=32673873
| jauntywundrkind wrote:
| Python's Substrait seems like the biggest/most-used competitor-
| ish out there. I'd love some compare & contrast; my sense is that
| Substrait has a smaller ambition, more wants to be a language for
| talking about execution rather than a full on
| optimization/execution engine. https://github.com/substrait-
| io/substrait .
|
| (Edit: ah, there's a recent talk discussing PyVelox trying to get
| Substrait integration.
| https://www.youtube.com/watch?v=l_kHxkGkNRg#t=18m22s . However
| there's also discussion about the un-maintainedness of some of
| the current Substrait work here; unclear status.
| https://github.com/facebookincubator/velox/issues/8895)
|
| We can also see from the Apache Arrow DataFusion discussion that
| they too see themselves as a bit of a Velox competitor.
| https://github.com/apache/arrow-datafusion/discussions/6441
|
| It's cool to see this space mature. I like that even Velox sees
| that Apache Arrow (underlying Apache Arrow DataFusion too) is
| industry standard tech that they ought work with.
| https://engineering.fb.com/2024/02/20/developer-tools/velox-...
|
| Theres a solid Influx post talks to some of how they are
| composing the assorted technologies to build they next gen 3.0,
| which I find helpful for getting a sense of how all the pieces of
| a modern high-performance data engine slot together.
| https://www.influxdata.com/blog/flight-datafusion-arrow-parq...
| kristjansson wrote:
| I think you're right - Substrait wants to sit above something
| like Velox. The closest comparison is probably Databricks
| Photon[1], but that's proprietary.
|
| [1]: https://www.databricks.com/product/photon
| zX41ZdbW wrote:
| Many ideas look like they were influenced by ClickHouse, and some
| are direct copies. I'm surprised they didn't provide references
| to ClickHouse, where the implementations are proven in production
| in the first place.
| gaogao wrote:
| Could you be specific about which ideas you think were
| influenced by ClickHouse specifically and not Presto or DuckDB
| or Spark?
| redskyluan wrote:
| Velox could be competitor of datafusion. It is more focus on
| execution engine and could be great to integrate to other high
| performance databases.
|
| Database will be split into pieces and rebuild!
| sakras wrote:
| Yes this has been an up-and-coming theme in the data science
| world. Arrow for the data format, Ibis for the API,
| Acero/Velox/DataFusion/DuckDB/Polars for execution, Substrait
| for the query plan representation, etc.
| sgt101 wrote:
| I wonder how many of this sort of FAANG project really get used
| where they are built. I went for an interview at a FAANG years
| ago to work on a very big consumer product (when it was in
| relative infancy) and expected to find a hyper tech data backend
| to use... they told me that they were using mySQL.
|
| I didn't get the job so maybe they were just joking around with
| me - but the general despair that they evinced about their data
| situation makes me wonder!
| bezosdontpipme wrote:
| I can neither confirm nor deny that S3's global bucket database
| is actually just MySql (with a lil bit of special sauce)
| sgt101 wrote:
| tbh my general response to all data questions is "use
| postgres". It does happen that someone comes back with a good
| reason why that would be a bad idea, but it's not frequent!
|
| mySQL == Oracle now... so bad on theological grounds.
| astrange wrote:
| You can use MariaDB.
| nonrandomstring wrote:
| And why ever not? It's a perfectly good solution, no?
|
| What the GP alludes to is interesting though - mythologising
| of organisations, brands and names.
|
| Spend enough time with "famous" people, "big names", centres
| of power and prominence and you quickly see everyone is just
| ordinary dudes doing ordinary things with ordinary gear. But
| for some reason there's fuck loads of money and attention,
| and sometimes cloying paranoia and adulation floating around.
|
| Sure, right out on the periphery are a noble few who play
| with particle accelerators, spaceships and bunker
| supercomputers. But then, that's just a day job too.
|
| True genius/exceptionalism is rare and found in the
| unexpected places. The rest is conjured out of thin air by
| marketing and PR people, the press, and commentators. They
| are the ones who need the big legend.
| influx wrote:
| Yeah, but I bet you the S3 Keymap isn't MySQL....
| ipsum2 wrote:
| Facebook/meta uses mySQL, but with a completely different
| engine (myrocks) and sharding techniques.
|
| YouTube uses mySQL but they've also rewritten major portions
| for scalability. (Vitess)
|
| Just because a company is using a technology you've heard of
| doesn't mean it's what you expect.
| riku_iki wrote:
| > YouTube uses mySQL but they've also rewritten major
| portions for scalability. (Vitess)
|
| I imagine this is some very old info(like 10 yo) and could
| change since then?
| philjohn wrote:
| At Meta they probably don't get built unless they're impactful,
| and they're not impactful if they're not used in production to
| solve a real pain point.
| kgp7 wrote:
| This is being actively used at Meta in Production across
| several engines ; the paper makes explicit references to this.
| sakras wrote:
| My general take is that while the idea of composability is good,
| the implementations of these things are just frankly not of high
| quality. Velox/Acero in particular are all plagued by what I've
| come to call "Java syndrome", where everything is written as
| idiomatic Java but with C++ syntax. Virtual methods,
| std::shared_ptr galore (in lieu of garbage collection), random
| heap allocations, etc. As a result these systems tend to be
| bloated and significantly slower than they need to be.
|
| DuckDB is good though, and I predict its quality of
| implementation will keep "monolithic databases" relevant for a
| while longer.
___________________________________________________________________
(page generated 2024-03-25 23:00 UTC)