[HN Gopher] Aggregate streaming data in real-time with WebAssembly
___________________________________________________________________
Aggregate streaming data in real-time with WebAssembly
Author : ahunyady
Score : 55 points
Date : 2021-08-24 17:36 UTC (5 hours ago)
(HTM) web link (infinyon.com)
(TXT) w3m dump (infinyon.com)
| rad_gruchalski wrote:
| This very much sounds like storm with bolts in wasm. I am not
| really sure why there is so much focus on the technology used for
| this product rather than what it actually does.
| ahunyady wrote:
| This blog focuses on a small piece of technology. The product
| has 3 core components: immutable stores, data streaming, and
| programmability. The goal of the product is to make data
| streaming easily accessible to all engineers. The cluster is
| easy to roll out, has a powerful CLI, and covers multiple use
| cases from log aggregation to data cleansing.
| kumarski wrote:
| wasmer.io anyone?
| alexchamberlain wrote:
| Why use WASM here? Security? Apologies if I missed that in the
| post.
| nicholastmosher wrote:
| No worries, it wasn't mentioned in this post in particular :)
|
| Security is certainly one of the reasons to use WASM, the
| ability to run it in a sandbox means that untrusted user code
| can be uploaded to Fluvio's Streaming Processing Units and do
| the processing inline, rather than on the client side. This can
| save big on network bandwidth, especially with a dataset where
| filtering whittles down a lot on volume.
|
| Other reasons include that WASM is a fast and portable bytecode
| format and that there is very good tooling and support for
| compiling Rust to WASM as well as embedding WASM runtimes in
| Rust, which works well for Fluvio as it's written in Rust.
|
| Here's another post with a bit more detail about some of the
| design and motivational factors if you're interested:
| https://www.infinyon.com/blog/2021/06/introducing-fluvio/#fl...
| nielsbot wrote:
| This feature ("Aggregations for Smart Streams") sounds like what
| I'd normally call "reduce":
|
| "Aggregates let you define functions that combine each record in
| a stream with some long-running state, or 'accumulator'."
| nicholastmosher wrote:
| Yes, this is very similar to "reduce" from the functional
| programming world, in fact it is equivalent to the "fold"
| pattern. The main difference is that fold is slightly more
| flexible since your accumulator may have a different type than
| your stream elements.
|
| In Rusty pseudocode, reduce requires a function with two inputs
| and an output of the same type:
|
| fn reduce<T>(f: Fn(T, T) -> T)
|
| Whereas fold may use one type for the accumulator and another
| type for the elements, but requires an initial accumulator
| value to be given explicitly:
|
| fn fold<A, T>(init: A, f: Fn(A, T) -> A)
|
| The Aggregate SmartStreams discussed in the blog follow this
| fold pattern, applied to a distributed persistent log as the
| stream and using WebAssembly modules as the functions.
| nielsbot wrote:
| I didn't know about "fold"... In Swift it's still call
| `reduce` even if your accumulator type is arbitrary:
|
| https://developer.apple.com/documentation/swift/array/229868.
| ..
| BenoitP wrote:
| Interesting! I'd love to see the SQL version of that though.
|
| CQRS reactive patterns with Flink or Spark computing a result to
| be send to the client could benefit from it: you could decide to
| move some aggregation client-side in the same business language.
| sehz wrote:
| Yes, we are thinking of supporting SQL
| tomrod wrote:
| Seconded, would love to see it support SQL. A subset of
| postgresql would go super far.
| ram_rar wrote:
| Finally, we are beginning to see some real back end applications
| of wasm apart from envoy proxy. This seems very similar to apache
| storm [1], where users can define UDFs (user defined functions)
| on their streams.
|
| Although, I dont understand whats the value add of wasm (apart
| from security) if the user still has to write code in Rust ->
| wasm. Why not just execute in rust alone?
|
| [1] https://storm.apache.org/
| sehz wrote:
| Until now, container was only way to provide isolation boundary
| which as process. With WASM, we can provide very fine level
| isolation and execution control.
|
| You can compile almost any language to WASM not just Rust. For
| example, Python, Go, Javascript:
| https://github.com/appcypher/awesome-wasm-langs.
| ahunyady wrote:
| As mentioned in a prior reply, the product has 3 components,
| where WebAssembly is the programmability part. In short,
| WebAssembly gives us the ability to work on the data streams in
| real-time as the data hits the cluster (we call this data
| gravity). That allows us to process records within
| milliseconds. That being said, we are happy to work with the
| Storm community on a connector if there is such demand.
___________________________________________________________________
(page generated 2021-08-24 23:01 UTC)