hngopher.com

       [HN Gopher] Drasi: Microsoft's open source data processing platf...
       ___________________________________________________________________
        
       Drasi: Microsoft's open source data processing platform for event-
       driven systems
        
       Author : benocodes
       Score  : 317 points
       Date   : 2024-10-20 16:07 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | gigatexal wrote:
       | Oh this very much reminds me of [feldera](https://feldera.com) --
       | they do incremental loads and computations using some novel
       | approaches (most of which i am too dumb to follow). Really nice
       | folks too.
        
         | woozyolliew wrote:
         | Or the related Materialize stuff https://materialize.com/
        
           | hobofan wrote:
           | I took a brief look into Drasi and it looks like it doesn't
           | do any of the differential/timely dataflow stuff (like
           | Materialize does), or any other sophisticated incremental
           | view maintenance methods that are rooted in Microsoft
           | Research.
        
       | smarx007 wrote:
       | https://azure.microsoft.com/en-us/blog/drasi-microsofts-newe...
        
       | CharlieDigital wrote:
       | Very interesting choice of using Cypher[0]
       | 
       | In 2014, we built a similar type event-driven system (but
       | specifically for document distribution (a document can be
       | distributed to a target set of entities; if a new entity is
       | added, we need to resolve which distributions match)) and also
       | ended up using Cypher via Neo4j (because of the complex
       | taxonomical structure of how we mapped entities).
       | 
       | It is a super underrated query language and while most of the
       | queries could also be translated to relational SQL, Cypher's
       | linear construction using WITH clauses is far, far easier to
       | reason about, IMO.
       | 
       | EDIT: feel like the devs went overboard with the mix of
       | languages. Shoehorned in C# Blazor? Using JS and Jest for e2e
       | testing?
       | 
       | [0] https://drasi.io/reference/query-language/
        
         | JanSt wrote:
         | I too have great memories of cypher. Such an elegant way to
         | write queries.
        
           | CharlieDigital wrote:
           | If you haven't been following it, I recently found out that
           | it is now supported in a limited capacity by Google
           | Spanner[0]. The openCypher initiative started a few years
           | back and it looks like it's evolved into the (unfortunate
           | moniker) GQL[1].
           | 
           | So it may be the case that we'll see more Cypher out in the
           | wild.
           | 
           | [0] https://cloud.google.com/spanner/docs/graph/opencypher-
           | refer...
           | 
           | [1] https://neo4j.com/blog/cypher-gql-world/
        
         | leeoniya wrote:
         | > while most of the queries could also be translated to
         | relational SQL, Cypher's linear construction using WITH clauses
         | is far, far easier to reason about, IMO.
         | 
         | https://prql-lang.org/
        
           | CharlieDigital wrote:
           | Didn't look too deeply, but one of the keys with Cypher (at
           | least in the context of graph databases) is that it has a
           | nice way of representing `JOIN` operations as graph
           | traversals.                   MATCH
           | (p:Person)-[r]-(c:Company) RETURN p.Name, c.Name
           | 
           | Where `r` can represent any relationship (AKA `JOIN`) between
           | the two collections `Person` and `Company` such as
           | `WORKS_AT`, `EMPLOYED_BY`, `CONTRACTOR_FOR`, etc.
           | 
           | So I'd say that linear queries are _one_ of the things I like
           | about Cypher, but the clean abstraction of complex `JOIN`
           | operations is another huge one.
        
             | UltraSane wrote:
             | The neat thing about Neo4j is that the [r] isn't a join, it
             | is an actual relationship stored on disk.
        
               | refset wrote:
               | Like a many-to-many join table?
        
             | inkyoto wrote:
             | > [...] Where `r` can represent any relationship [...]
             | 
             | ... and <<-[r]->> can represent any relationship direction,
             | which obviates the need for constructing separate queries
             | for inverse traversing relationships. Kinda like running a
             | compiler forward and backward.
        
         | robertlagrant wrote:
         | We made a health backend partly using Cypher and the only thing
         | I found was the simple queries looked amazing, but as soon as
         | you need to join non-linearly it started looking a lot like SQL
         | again. And when you're using an ORM it stops mattering. And
         | when you need migrations it gets painful!
        
           | CharlieDigital wrote:
           | > but as soon as you need to join non-linearly
           | 
           | At least in our use case, even with some very gnarly 20+ line
           | Cypher queries, it never got to the point where it felt like
           | SQL and certainly, those same queries would be even gnarlier
           | as nested sub-selects, CTEs, or recursive selects, IMO.
           | 
           | Perhaps a characteristic of our model (a taxonomy of Region,
           | Country, Sponsor, Program, Trial, Site, Staff for global
           | clinical trials and documents required by
           | Region/Country/Program/Trial).
        
             | UltraSane wrote:
             | Cypher works really well with a well defined taxonomy.
        
           | UltraSane wrote:
           | "you need to join non-linearly "
           | 
           | What does this mean?
        
             | FromOmelas wrote:
             | presumably it has a semantic model of sorts, defining
             | intrinsic relationships between entities (parent-child,
             | composed-of, sibling-of, and so on)
             | 
             | A bit similar how certain joins in SQL can be very
             | straightforward with the "USING" clause, or when it can
             | rely on extra information such as analytic views to derive
             | materialized views (vendor specific)
        
       | fatliverfreddy wrote:
       | I wish I could use Cypher for everything
        
       | f4c39012 wrote:
       | Purple!
        
         | computronus wrote:
         | Green!
        
           | JaimeThompson wrote:
           | Just my luck. I get stuck with a race that speaks only in
           | macros.
        
       | imvetri wrote:
       | What does it process it from and what does it process it to?
       | 
       | Is it programmable or you have a concrete concept theorised?
       | 
       | What is it useful for? How it helps business in saving cost or
       | increasing profit? Is it a hobby project?
        
       | otterley wrote:
       | Looks very Azure-centric. Both installation guides
       | (https://drasi.io/how-to-guides/install-sample-applications/b...
       | and https://drasi.io/how-to-guides/install-sample-
       | applications/c...) require Azure to work.
       | 
       | And then there's this:
       | 
       | > Installing Drasi in an EKS cluster can be significantly more
       | complex than a standard installation on other platforms. Instead
       | of downloading a CLI binary using the provided installation
       | scripts, this approach requires modifying the source code of the
       | Drasi CLI and building a local version of the CLI.
       | 
       | Is this an actual requirement or just the current easy path?
        
         | dtquad wrote:
         | That is usual for new Microsoft open source projects. It takes
         | 1-2 months for the Azure dependencies to go away.
        
           | 3abiton wrote:
           | I'm curious about the other examples? I get it though, as
           | many of these projects are built fulfilling a specific need
           | within MS infrastructure.
        
         | pjmlp wrote:
         | Azure is the new Windows, as timesharing OS, thus yeah that is
         | to be expected.
        
         | jameslevy wrote:
         | Does it require Azure to work? Or could the Azure steps be
         | relatively easily be swapped out for AWS/GCP/etc?
        
         | stackskipton wrote:
         | Azure SRE here, it doesn't appear to have any Azure
         | dependencies. CLI rebuild seems to be that "drasi init" assumes
         | Azure Kubernetes Service built in StorageClasses for Kubernetes
         | PVC for Redis and Mongo and thus fails when running against
         | EKS. I assume same thing would be required on GKE. Yes, it
         | should be more modular but MVP.
         | 
         | As for other stuff, it's using Gremlin Query Language or
         | Postgres which are both open. In fact, it's going out of way
         | it's not to use Azure authenication as loading connection
         | string as Kubernetes secret is 100% AGAINST Azure Kubernetes
         | Best Practice. Best Practice would be Workload Identity.
        
           | bob1029 wrote:
           | > CLI rebuild seems to be that "drasi init" assumes Azure
           | Kubernetes Service built in StorageClasses for Kubernetes PVC
           | for Redis and Mongo and thus fails when running against EKS.
           | I assume same thing would be required on GKE. Yes, it should
           | be more modular but MVP.
           | 
           | None of these words are in the Bible.
        
         | devjab wrote:
         | Every bit of Microsoft open source is created at least partly
         | as a sales strategy for Azure. They usually start within the
         | Azure infrastructure because, well, why wouldn't they? Then
         | eventually they tend make it to where you can use them outside
         | of Azure but they never quite leave the part where they are
         | "better" if you're an Azure customer.
         | 
         | Time will tell if Drasi is going to go the path where it
         | becomes more easily useable outside of Azure (and in this case
         | AWS) or it'll go more of a Bicep route.
        
       | sitkack wrote:
       | But at what cost?
        
       | stefanos82 wrote:
       | Drasi...React...well played Microsoft, well played :D
       | 
       | Assuming they choose this name from the Greek drase which means
       | action, React of course is the exact opposite to action, thus the
       | React-ion; an action expects a reaction, somewhere somehow!
        
         | benbristow wrote:
         | Not like Microsoft to name things well...
        
           | j-a-a-p wrote:
           | VMS++ = Windows NT?
        
         | TeMPOraL wrote:
         | Here I thought they were accidentally or intentionally
         | referring to:
         | 
         | https://babylon5.fandom.com/wiki/Drazi
         | 
         | But now I noticed the spelling difference :/.
        
       | resters wrote:
       | This is a very solid pattern. Many systems that are built using
       | traditional relational database systems would lend themselves to
       | far simpler designs using this paradigm. It is not necessarily
       | immediately obvious but nonetheless quite true.
        
         | unit149 wrote:
         | Beginning with Boolean operators: and / or - this relational
         | service model can distribute queries. Curious why Cypher [0]
         | abandons this syntax.
        
       | SiddanthEmani wrote:
       | Cypher is so cool. I included a graph database in my RAG patient
       | chatbot
       | 
       | https://github.com/SiddanthEmani/patient_chatbot
        
       | akmittal wrote:
       | Go seem to be good choice for data processing systems.
        
       | dxxvi wrote:
       | Is this what can be done with Apache Kafka Connect (to get data
       | from another source to a Kafka cluster), Kafka (including Kafka
       | Streams)? This image (https://github.com/drasi-
       | project/community/raw/main/images/d...) is like Kafka Streams
       | with a single topic. This image (https://github.com/drasi-
       | project/community/raw/main/images/c...) is like joining 2 streams
       | in Kafka Streams.
        
         | ultrafez wrote:
         | It also seems reminiscent of KSQL - consuming multiple input
         | topics, and producing output to a topic defined using a query
         | written in a SQL-like language that defines how the inputs are
         | combined and filtered.
        
       | fatliverfreddy wrote:
       | I see more Cypher fans out here - check out https://cyphernet.es
       | if you work with Kubernetes!
        
         | jeremycarter wrote:
         | Brilliant
        
       | mnsc wrote:
       | I finished reading Kleppman's Designing Data-Intensive
       | Applications last night and this looks like it's straight out of
       | the last chapter that talk about the future. They don't use the
       | term "dataflow" though.
       | 
       | https://www.oreilly.com/library/view/designing-data-intensiv...
        
         | 9dev wrote:
         | That one's also on my reading list. Was it worth the read?
        
           | yas_hmaheshwari wrote:
           | This book is definitely worth the read. Or maybe worth 10
           | reads. Its really that awesome!
        
           | xnorswap wrote:
           | I read it over the summer and I'd say it's essential reading
           | for any developer who deals with data.
           | 
           | Perhaps most importantly, the book empowered me to talk
           | confidently about the trade-offs involved with different
           | choices of handling data, and gave me a language framework to
           | talk accurately about those choices.
           | 
           | Previously even the parts I did understand was from
           | experience, and not an academic background, so my
           | explanations were hand-wavy or sloppy, but now I can state my
           | case for different solutions much more clearly.
        
       | iamstan23 wrote:
       | Weird thing about this project is that neither the website
       | (https://drasi.io) or the repo (https://github.com/drasi-
       | project/drasi-platform) mention that it's a Microsoft project.
       | 
       | Also the only cloud provider it has installation instructions for
       | is AWS's EKS platform. Yet it has integration instructions for
       | Azure CosmosDb Gremlin API.
       | 
       | That one customer out there using EKS and Gremlin on CosmosDb is
       | probably over the moon right now.
        
         | vladsanchez wrote:
         | https://azure.microsoft.com/en-us/blog/drasi-microsofts-newe...
         | 
         | > "The Microsoft Azure Incubations team is excited to announce
         | that Drasi is now available as an open-source project."
        
       | unixhero wrote:
       | I would really enjoy using it. But as a novice data intensive
       | application developer, why would I not query the table 30 seconds
       | and look for changes with a Python program (or another regular
       | programming language)?
        
         | bobnamob wrote:
         | One of the best resources to understand "why would I not ... ?"
         | in a data intensive context is Kleppmann's Designing Data-
         | Intensive Applications[1] (mentioned elsewhere in the
         | comments). There's a lot of nuance to why event streaming wins
         | out over periodically "polling" a database, mostly about
         | maintaining correctness while being able to scale horizontally.
         | 
         | Taking a look at the Kafka docs [2] is also enlightening.
         | 
         | [1] https://www.amazon.com/Designing-Data-Intensive-
         | Applications...
         | 
         | [2] https://kafka.apache.org/documentation/#gettingStarted
        
       | lasermike026 wrote:
       | I suppose students need to prepare to defend what they are
       | writing. Also, teachers may need a bit of a demotion when making
       | accusations of plagiarism or generated papers. Teachers at the
       | very least should be able to reasonably prove their accusations.
       | There is a greater problem with tutors writing students papers.
       | If teachers and students worked more closely this wouldn't be an
       | issue.
        
         | dijksterhuis wrote:
         | i feel like this comment was maybe supposed to be posted under
         | this article?
         | 
         | https://news.ycombinator.com/item?id=41896973
        
       | emmanueloga_ wrote:
       | They don't mention "CDC" (Change Data Capture) directly anywhere,
       | but I think that's what Drasi is? (they call it "Data Change
       | Processing platform").
       | 
       | "Debezium", an alternative CDC system, is mentioned in the
       | documentation and sources [1]. I'm not sure if Drasi uses
       | Debezium, or aims to be compatible with it. Maybe someone here
       | can shine more light on the relationship between these two?
       | 
       | --
       | 
       | 1: https://github.com/drasi-project/drasi-
       | platform/tree/main/re...
        
       | purpleidea wrote:
       | This feels like a specialized version of
       | https://github.com/purpleidea/mgmt/ but Microsoft only.
        
       ___________________________________________________________________
       (page generated 2024-10-21 23:02 UTC)