[HN Gopher] Data Parallel, Task Parallel, and Agent Actor Archit...
       ___________________________________________________________________
        
       Data Parallel, Task Parallel, and Agent Actor Architectures
        
       Author : skadamat
       Score  : 47 points
       Date   : 2023-07-18 18:46 UTC (4 hours ago)
        
 (HTM) web link (bytewax.io)
 (TXT) w3m dump (bytewax.io)
        
       | amath wrote:
       | Author of the post here. I would love to hear thoughts on folks
       | experience using the different architectures and more pros and
       | cons to each one. There was surprisingly not a lot of information
       | comparing these different architectures that I could find and I
       | would love to be able to update this post with more details.
        
         | Sardtok wrote:
         | There's the Seven Concurrency Models in Seven Weeks book. It's
         | a bit old, and not so much about frameworks, but more about
         | programming languages and their available threading models or
         | alternatives to pure threads. It still covers a lot of the same
         | theory, but not distributed, and a bit lower level.
         | 
         | https://pragprog.com/titles/pb7con/seven-concurrency-models-...
        
         | skadamat wrote:
         | besides Ray, are there other tools or platforms that embrace
         | the agent actor architecture model?
        
           | Sardtok wrote:
           | Akka uses the actor model.
           | https://doc.akka.io/docs/akka/current/typed/actors.html
           | 
           | IIRC, Akka based their implementation on Erlang (possibly
           | OTP), but it's been a long time since I did anything with
           | either Erlang or Scala, and I never used Akka in Java.
        
           | felixgallo wrote:
           | https://www.erlang.org/ https://elixir-lang.org/
        
             | photonthug wrote:
             | Elixir is amazing, see also stuff like
             | https://github.com/bitwalker/libcluster . I'm always
             | surprised there's not more excitement generated by the
             | possibility of having _language-level features_ that would
             | normally require all the weight of something like k8s. I
             | don 't get the impression that many people are seriously
             | using it for data-engineering work though, which seems like
             | a shame
        
             | Nezteb wrote:
             | Elixir definitely!
             | 
             | If you need to use Go, there is also ergo:
             | https://github.com/ergo-services/ergo
             | 
             | Yet another cool tool is lunatic:
             | https://github.com/lunatic-solutions/lunatic
        
       | photonthug wrote:
       | Ok, I'll bite. I've always thought that agent-oriented / reactor
       | / proactor type patterns are elegant and yet under-utilized and
       | under-appreciated. If I had to guess about why.. maybe these
       | systems scale well in terms of traffic but do not tend to scale
       | well in terms of implementation at most orgs?
       | 
       | When the average dev team finally gets the simple queue they
       | asked ops for, the thing is probably late and only present in 2/3
       | of dev/qa/prod (or some similar SNAFU). Flink works with AWS EMR,
       | so we can't explain everything by just considering whether hosted
       | services are available, but admin/setup/general conceptual
       | overhead is on a different level. Partly because they've been
       | burned in the past with stuff like this, and partly because they
       | are lazy.. devs mostly want simple infrastructure mirroring
       | simple data-structures where they can test/develop discrete code
       | locally without thinking about system-level stuff.
       | 
       | Consider the problem of a) developing a new agent in an actor
       | system vs b) developing a new "step" in a simpler batch-oriented
       | pipeline. For (a), to actually run my code experimentally locally
       | I need to simulate the appropriate message(s) locally and maybe
       | other aspects of system-runtime. Whereas for (b) one digs up
       | appropriate input and starts hacking on a docker-container that
       | one trusts can be jammed into an airflow DAG later. Both
       | approaches need a bit of dev/test harness kinda setup, but (a) is
       | potentially more involved and more importantly it usually crosses
       | team-boundaries (requiring both devops and devs). This will
       | probably create pain unless the org has a talented "platform
       | team" already in place.
       | 
       | Another aspect involving team boundaries is that besides dev
       | teams disliking ops/infra, they also don't trust each other! So
       | after effort is sunk into things like the "smarter-queue" that
       | might support messaging and actors making actors, it turns out
       | every team wants their own queues, routing, codebases, underlying
       | storage, etc. For better or worse, attempting to provide features
       | along the lines of flexibility/interoperability are thus
       | undermined. Out of necessity, teams often want to provide some
       | limited access to _data_ which other teams can consume. But they
       | resist providing any kind of access to code APIs  / runtime.
       | 
       | Typical scenario: Ever work at an org that's supposed to have a
       | data-lake, but every single new app or new feature inside an
       | existing app generates requests for new buckets that literally
       | nothing in the existing system can access? Some manager or
       | "senior" dev is having a knee-jerk reaction that they want to
       | build a kingdom. In the end they won't actually _enjoy_ answering
       | access-requests to their walled-garden, but they think they can
       | push those over to support requests. With only data as a
       | deliverable, they don 't need to think about any interop and,
       | bonus, no one will even know if their messy scripts are in
       | version control.
        
         | amath wrote:
         | describing so many nightmares here :). The overhead of
         | testing/running locally is so important and yet so frequently
         | not optimized for.
        
       | danielovichdk wrote:
       | Tasks and Parallel are available as first class citizens in the
       | .NET framework. Large apis. Easy to cold start a good headache.
       | It's not easy stuff.
       | 
       | Difficult patterns to optimize for is my experience if any IO or
       | network boundaries are in play. Even that and it sometimes seems
       | the more knowledge on how threading works on one particular cpu
       | architecture.
       | 
       | So for me at least, on a high level these patterns seem easy
       | enough and seems to play well in to the popculcural small service
       | fad and large data processing. But be aware of the underlying cpu
       | architecture and threading and how thread pools work on your
       | particular OS.
       | 
       | Oh yes and then comes the debugging and reading the code part,
       | which we all know are where the real efforts of time comes to
       | play.
       | 
       | Use these when absolutely no other options are available. Just
       | like multithreading.
        
         | neonsunset wrote:
         | var task1 = service.DispatchFirst(param1);
         | 
         | var task2 = service.DispatchSecond(param2);
         | 
         | var final = service.DispatchThird(await task1, await task2);
         | 
         | or
         | 
         | var queries = users.Select(user => FetchPurchases(user.Id));
         | 
         | var results = await Task.WhenAll(queries);
         | 
         | Very easy.
         | 
         | As long as you don't touch System.Threading.Tasks.Dataflow
         | namespace, everything will be good.
        
         | victor106 wrote:
         | Anyone know of these are available in the Java world?
        
         | Fellshard wrote:
         | When I used those libraries in .NET, they simply did not
         | function, for unspecified reasons. I boiled it down to the
         | smallest possible example, based on their docs, and data simply
         | never made it through the flow at all. It was baffling to me,
         | but not an uncommon experience with various components of the
         | core of .NET (not to be confused with .NET Core).
        
       | CyberDildonics wrote:
       | "data parallel" and "task parallel" are the same thing. You work
       | out data dependencies ahead of time and a thread does something
       | specific by reading the data it needs. Whether you do that by
       | giving a bunch of threads the same data and then giving them a
       | range they need to work on or give them different chunks of data,
       | it boils down to the same mechanic of having data ahead of time
       | and executing something specific, dictated by a different thread
       | and using data they don't own.
        
         | wmf wrote:
         | At a high enough level I guess everything is just computation
         | but using GPGPU as an example there are programs where control
         | flow diverges and ones where it doesn't.
        
       ___________________________________________________________________
       (page generated 2023-07-18 23:00 UTC)