[HN Gopher] Aggregate streaming data in real-time with WebAssembly
       ___________________________________________________________________
        
       Aggregate streaming data in real-time with WebAssembly
        
       Author : ahunyady
       Score  : 55 points
       Date   : 2021-08-24 17:36 UTC (5 hours ago)
        
 (HTM) web link (infinyon.com)
 (TXT) w3m dump (infinyon.com)
        
       | rad_gruchalski wrote:
       | This very much sounds like storm with bolts in wasm. I am not
       | really sure why there is so much focus on the technology used for
       | this product rather than what it actually does.
        
         | ahunyady wrote:
         | This blog focuses on a small piece of technology. The product
         | has 3 core components: immutable stores, data streaming, and
         | programmability. The goal of the product is to make data
         | streaming easily accessible to all engineers. The cluster is
         | easy to roll out, has a powerful CLI, and covers multiple use
         | cases from log aggregation to data cleansing.
        
       | kumarski wrote:
       | wasmer.io anyone?
        
       | alexchamberlain wrote:
       | Why use WASM here? Security? Apologies if I missed that in the
       | post.
        
         | nicholastmosher wrote:
         | No worries, it wasn't mentioned in this post in particular :)
         | 
         | Security is certainly one of the reasons to use WASM, the
         | ability to run it in a sandbox means that untrusted user code
         | can be uploaded to Fluvio's Streaming Processing Units and do
         | the processing inline, rather than on the client side. This can
         | save big on network bandwidth, especially with a dataset where
         | filtering whittles down a lot on volume.
         | 
         | Other reasons include that WASM is a fast and portable bytecode
         | format and that there is very good tooling and support for
         | compiling Rust to WASM as well as embedding WASM runtimes in
         | Rust, which works well for Fluvio as it's written in Rust.
         | 
         | Here's another post with a bit more detail about some of the
         | design and motivational factors if you're interested:
         | https://www.infinyon.com/blog/2021/06/introducing-fluvio/#fl...
        
       | nielsbot wrote:
       | This feature ("Aggregations for Smart Streams") sounds like what
       | I'd normally call "reduce":
       | 
       | "Aggregates let you define functions that combine each record in
       | a stream with some long-running state, or 'accumulator'."
        
         | nicholastmosher wrote:
         | Yes, this is very similar to "reduce" from the functional
         | programming world, in fact it is equivalent to the "fold"
         | pattern. The main difference is that fold is slightly more
         | flexible since your accumulator may have a different type than
         | your stream elements.
         | 
         | In Rusty pseudocode, reduce requires a function with two inputs
         | and an output of the same type:
         | 
         | fn reduce<T>(f: Fn(T, T) -> T)
         | 
         | Whereas fold may use one type for the accumulator and another
         | type for the elements, but requires an initial accumulator
         | value to be given explicitly:
         | 
         | fn fold<A, T>(init: A, f: Fn(A, T) -> A)
         | 
         | The Aggregate SmartStreams discussed in the blog follow this
         | fold pattern, applied to a distributed persistent log as the
         | stream and using WebAssembly modules as the functions.
        
           | nielsbot wrote:
           | I didn't know about "fold"... In Swift it's still call
           | `reduce` even if your accumulator type is arbitrary:
           | 
           | https://developer.apple.com/documentation/swift/array/229868.
           | ..
        
       | BenoitP wrote:
       | Interesting! I'd love to see the SQL version of that though.
       | 
       | CQRS reactive patterns with Flink or Spark computing a result to
       | be send to the client could benefit from it: you could decide to
       | move some aggregation client-side in the same business language.
        
         | sehz wrote:
         | Yes, we are thinking of supporting SQL
        
           | tomrod wrote:
           | Seconded, would love to see it support SQL. A subset of
           | postgresql would go super far.
        
       | ram_rar wrote:
       | Finally, we are beginning to see some real back end applications
       | of wasm apart from envoy proxy. This seems very similar to apache
       | storm [1], where users can define UDFs (user defined functions)
       | on their streams.
       | 
       | Although, I dont understand whats the value add of wasm (apart
       | from security) if the user still has to write code in Rust ->
       | wasm. Why not just execute in rust alone?
       | 
       | [1] https://storm.apache.org/
        
         | sehz wrote:
         | Until now, container was only way to provide isolation boundary
         | which as process. With WASM, we can provide very fine level
         | isolation and execution control.
         | 
         | You can compile almost any language to WASM not just Rust. For
         | example, Python, Go, Javascript:
         | https://github.com/appcypher/awesome-wasm-langs.
        
         | ahunyady wrote:
         | As mentioned in a prior reply, the product has 3 components,
         | where WebAssembly is the programmability part. In short,
         | WebAssembly gives us the ability to work on the data streams in
         | real-time as the data hits the cluster (we call this data
         | gravity). That allows us to process records within
         | milliseconds. That being said, we are happy to work with the
         | Storm community on a connector if there is such demand.
        
       ___________________________________________________________________
       (page generated 2021-08-24 23:01 UTC)