[HN Gopher] Drasi: Microsoft's open source data processing platf...
___________________________________________________________________
Drasi: Microsoft's open source data processing platform for event-
driven systems
Author : benocodes
Score : 317 points
Date : 2024-10-20 16:07 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| gigatexal wrote:
| Oh this very much reminds me of [feldera](https://feldera.com) --
| they do incremental loads and computations using some novel
| approaches (most of which i am too dumb to follow). Really nice
| folks too.
| woozyolliew wrote:
| Or the related Materialize stuff https://materialize.com/
| hobofan wrote:
| I took a brief look into Drasi and it looks like it doesn't
| do any of the differential/timely dataflow stuff (like
| Materialize does), or any other sophisticated incremental
| view maintenance methods that are rooted in Microsoft
| Research.
| smarx007 wrote:
| https://azure.microsoft.com/en-us/blog/drasi-microsofts-newe...
| CharlieDigital wrote:
| Very interesting choice of using Cypher[0]
|
| In 2014, we built a similar type event-driven system (but
| specifically for document distribution (a document can be
| distributed to a target set of entities; if a new entity is
| added, we need to resolve which distributions match)) and also
| ended up using Cypher via Neo4j (because of the complex
| taxonomical structure of how we mapped entities).
|
| It is a super underrated query language and while most of the
| queries could also be translated to relational SQL, Cypher's
| linear construction using WITH clauses is far, far easier to
| reason about, IMO.
|
| EDIT: feel like the devs went overboard with the mix of
| languages. Shoehorned in C# Blazor? Using JS and Jest for e2e
| testing?
|
| [0] https://drasi.io/reference/query-language/
| JanSt wrote:
| I too have great memories of cypher. Such an elegant way to
| write queries.
| CharlieDigital wrote:
| If you haven't been following it, I recently found out that
| it is now supported in a limited capacity by Google
| Spanner[0]. The openCypher initiative started a few years
| back and it looks like it's evolved into the (unfortunate
| moniker) GQL[1].
|
| So it may be the case that we'll see more Cypher out in the
| wild.
|
| [0] https://cloud.google.com/spanner/docs/graph/opencypher-
| refer...
|
| [1] https://neo4j.com/blog/cypher-gql-world/
| leeoniya wrote:
| > while most of the queries could also be translated to
| relational SQL, Cypher's linear construction using WITH clauses
| is far, far easier to reason about, IMO.
|
| https://prql-lang.org/
| CharlieDigital wrote:
| Didn't look too deeply, but one of the keys with Cypher (at
| least in the context of graph databases) is that it has a
| nice way of representing `JOIN` operations as graph
| traversals. MATCH
| (p:Person)-[r]-(c:Company) RETURN p.Name, c.Name
|
| Where `r` can represent any relationship (AKA `JOIN`) between
| the two collections `Person` and `Company` such as
| `WORKS_AT`, `EMPLOYED_BY`, `CONTRACTOR_FOR`, etc.
|
| So I'd say that linear queries are _one_ of the things I like
| about Cypher, but the clean abstraction of complex `JOIN`
| operations is another huge one.
| UltraSane wrote:
| The neat thing about Neo4j is that the [r] isn't a join, it
| is an actual relationship stored on disk.
| refset wrote:
| Like a many-to-many join table?
| inkyoto wrote:
| > [...] Where `r` can represent any relationship [...]
|
| ... and <<-[r]->> can represent any relationship direction,
| which obviates the need for constructing separate queries
| for inverse traversing relationships. Kinda like running a
| compiler forward and backward.
| robertlagrant wrote:
| We made a health backend partly using Cypher and the only thing
| I found was the simple queries looked amazing, but as soon as
| you need to join non-linearly it started looking a lot like SQL
| again. And when you're using an ORM it stops mattering. And
| when you need migrations it gets painful!
| CharlieDigital wrote:
| > but as soon as you need to join non-linearly
|
| At least in our use case, even with some very gnarly 20+ line
| Cypher queries, it never got to the point where it felt like
| SQL and certainly, those same queries would be even gnarlier
| as nested sub-selects, CTEs, or recursive selects, IMO.
|
| Perhaps a characteristic of our model (a taxonomy of Region,
| Country, Sponsor, Program, Trial, Site, Staff for global
| clinical trials and documents required by
| Region/Country/Program/Trial).
| UltraSane wrote:
| Cypher works really well with a well defined taxonomy.
| UltraSane wrote:
| "you need to join non-linearly "
|
| What does this mean?
| FromOmelas wrote:
| presumably it has a semantic model of sorts, defining
| intrinsic relationships between entities (parent-child,
| composed-of, sibling-of, and so on)
|
| A bit similar how certain joins in SQL can be very
| straightforward with the "USING" clause, or when it can
| rely on extra information such as analytic views to derive
| materialized views (vendor specific)
| fatliverfreddy wrote:
| I wish I could use Cypher for everything
| f4c39012 wrote:
| Purple!
| computronus wrote:
| Green!
| JaimeThompson wrote:
| Just my luck. I get stuck with a race that speaks only in
| macros.
| imvetri wrote:
| What does it process it from and what does it process it to?
|
| Is it programmable or you have a concrete concept theorised?
|
| What is it useful for? How it helps business in saving cost or
| increasing profit? Is it a hobby project?
| otterley wrote:
| Looks very Azure-centric. Both installation guides
| (https://drasi.io/how-to-guides/install-sample-applications/b...
| and https://drasi.io/how-to-guides/install-sample-
| applications/c...) require Azure to work.
|
| And then there's this:
|
| > Installing Drasi in an EKS cluster can be significantly more
| complex than a standard installation on other platforms. Instead
| of downloading a CLI binary using the provided installation
| scripts, this approach requires modifying the source code of the
| Drasi CLI and building a local version of the CLI.
|
| Is this an actual requirement or just the current easy path?
| dtquad wrote:
| That is usual for new Microsoft open source projects. It takes
| 1-2 months for the Azure dependencies to go away.
| 3abiton wrote:
| I'm curious about the other examples? I get it though, as
| many of these projects are built fulfilling a specific need
| within MS infrastructure.
| pjmlp wrote:
| Azure is the new Windows, as timesharing OS, thus yeah that is
| to be expected.
| jameslevy wrote:
| Does it require Azure to work? Or could the Azure steps be
| relatively easily be swapped out for AWS/GCP/etc?
| stackskipton wrote:
| Azure SRE here, it doesn't appear to have any Azure
| dependencies. CLI rebuild seems to be that "drasi init" assumes
| Azure Kubernetes Service built in StorageClasses for Kubernetes
| PVC for Redis and Mongo and thus fails when running against
| EKS. I assume same thing would be required on GKE. Yes, it
| should be more modular but MVP.
|
| As for other stuff, it's using Gremlin Query Language or
| Postgres which are both open. In fact, it's going out of way
| it's not to use Azure authenication as loading connection
| string as Kubernetes secret is 100% AGAINST Azure Kubernetes
| Best Practice. Best Practice would be Workload Identity.
| bob1029 wrote:
| > CLI rebuild seems to be that "drasi init" assumes Azure
| Kubernetes Service built in StorageClasses for Kubernetes PVC
| for Redis and Mongo and thus fails when running against EKS.
| I assume same thing would be required on GKE. Yes, it should
| be more modular but MVP.
|
| None of these words are in the Bible.
| devjab wrote:
| Every bit of Microsoft open source is created at least partly
| as a sales strategy for Azure. They usually start within the
| Azure infrastructure because, well, why wouldn't they? Then
| eventually they tend make it to where you can use them outside
| of Azure but they never quite leave the part where they are
| "better" if you're an Azure customer.
|
| Time will tell if Drasi is going to go the path where it
| becomes more easily useable outside of Azure (and in this case
| AWS) or it'll go more of a Bicep route.
| sitkack wrote:
| But at what cost?
| stefanos82 wrote:
| Drasi...React...well played Microsoft, well played :D
|
| Assuming they choose this name from the Greek drase which means
| action, React of course is the exact opposite to action, thus the
| React-ion; an action expects a reaction, somewhere somehow!
| benbristow wrote:
| Not like Microsoft to name things well...
| j-a-a-p wrote:
| VMS++ = Windows NT?
| TeMPOraL wrote:
| Here I thought they were accidentally or intentionally
| referring to:
|
| https://babylon5.fandom.com/wiki/Drazi
|
| But now I noticed the spelling difference :/.
| resters wrote:
| This is a very solid pattern. Many systems that are built using
| traditional relational database systems would lend themselves to
| far simpler designs using this paradigm. It is not necessarily
| immediately obvious but nonetheless quite true.
| unit149 wrote:
| Beginning with Boolean operators: and / or - this relational
| service model can distribute queries. Curious why Cypher [0]
| abandons this syntax.
| SiddanthEmani wrote:
| Cypher is so cool. I included a graph database in my RAG patient
| chatbot
|
| https://github.com/SiddanthEmani/patient_chatbot
| akmittal wrote:
| Go seem to be good choice for data processing systems.
| dxxvi wrote:
| Is this what can be done with Apache Kafka Connect (to get data
| from another source to a Kafka cluster), Kafka (including Kafka
| Streams)? This image (https://github.com/drasi-
| project/community/raw/main/images/d...) is like Kafka Streams
| with a single topic. This image (https://github.com/drasi-
| project/community/raw/main/images/c...) is like joining 2 streams
| in Kafka Streams.
| ultrafez wrote:
| It also seems reminiscent of KSQL - consuming multiple input
| topics, and producing output to a topic defined using a query
| written in a SQL-like language that defines how the inputs are
| combined and filtered.
| fatliverfreddy wrote:
| I see more Cypher fans out here - check out https://cyphernet.es
| if you work with Kubernetes!
| jeremycarter wrote:
| Brilliant
| mnsc wrote:
| I finished reading Kleppman's Designing Data-Intensive
| Applications last night and this looks like it's straight out of
| the last chapter that talk about the future. They don't use the
| term "dataflow" though.
|
| https://www.oreilly.com/library/view/designing-data-intensiv...
| 9dev wrote:
| That one's also on my reading list. Was it worth the read?
| yas_hmaheshwari wrote:
| This book is definitely worth the read. Or maybe worth 10
| reads. Its really that awesome!
| xnorswap wrote:
| I read it over the summer and I'd say it's essential reading
| for any developer who deals with data.
|
| Perhaps most importantly, the book empowered me to talk
| confidently about the trade-offs involved with different
| choices of handling data, and gave me a language framework to
| talk accurately about those choices.
|
| Previously even the parts I did understand was from
| experience, and not an academic background, so my
| explanations were hand-wavy or sloppy, but now I can state my
| case for different solutions much more clearly.
| iamstan23 wrote:
| Weird thing about this project is that neither the website
| (https://drasi.io) or the repo (https://github.com/drasi-
| project/drasi-platform) mention that it's a Microsoft project.
|
| Also the only cloud provider it has installation instructions for
| is AWS's EKS platform. Yet it has integration instructions for
| Azure CosmosDb Gremlin API.
|
| That one customer out there using EKS and Gremlin on CosmosDb is
| probably over the moon right now.
| vladsanchez wrote:
| https://azure.microsoft.com/en-us/blog/drasi-microsofts-newe...
|
| > "The Microsoft Azure Incubations team is excited to announce
| that Drasi is now available as an open-source project."
| unixhero wrote:
| I would really enjoy using it. But as a novice data intensive
| application developer, why would I not query the table 30 seconds
| and look for changes with a Python program (or another regular
| programming language)?
| bobnamob wrote:
| One of the best resources to understand "why would I not ... ?"
| in a data intensive context is Kleppmann's Designing Data-
| Intensive Applications[1] (mentioned elsewhere in the
| comments). There's a lot of nuance to why event streaming wins
| out over periodically "polling" a database, mostly about
| maintaining correctness while being able to scale horizontally.
|
| Taking a look at the Kafka docs [2] is also enlightening.
|
| [1] https://www.amazon.com/Designing-Data-Intensive-
| Applications...
|
| [2] https://kafka.apache.org/documentation/#gettingStarted
| lasermike026 wrote:
| I suppose students need to prepare to defend what they are
| writing. Also, teachers may need a bit of a demotion when making
| accusations of plagiarism or generated papers. Teachers at the
| very least should be able to reasonably prove their accusations.
| There is a greater problem with tutors writing students papers.
| If teachers and students worked more closely this wouldn't be an
| issue.
| dijksterhuis wrote:
| i feel like this comment was maybe supposed to be posted under
| this article?
|
| https://news.ycombinator.com/item?id=41896973
| emmanueloga_ wrote:
| They don't mention "CDC" (Change Data Capture) directly anywhere,
| but I think that's what Drasi is? (they call it "Data Change
| Processing platform").
|
| "Debezium", an alternative CDC system, is mentioned in the
| documentation and sources [1]. I'm not sure if Drasi uses
| Debezium, or aims to be compatible with it. Maybe someone here
| can shine more light on the relationship between these two?
|
| --
|
| 1: https://github.com/drasi-project/drasi-
| platform/tree/main/re...
| purpleidea wrote:
| This feels like a specialized version of
| https://github.com/purpleidea/mgmt/ but Microsoft only.
___________________________________________________________________
(page generated 2024-10-21 23:02 UTC)