[HN Gopher] Data Parallel, Task Parallel, and Agent Actor Archit...
___________________________________________________________________
Data Parallel, Task Parallel, and Agent Actor Architectures
Author : skadamat
Score : 47 points
Date : 2023-07-18 18:46 UTC (4 hours ago)
(HTM) web link (bytewax.io)
(TXT) w3m dump (bytewax.io)
| amath wrote:
| Author of the post here. I would love to hear thoughts on folks
| experience using the different architectures and more pros and
| cons to each one. There was surprisingly not a lot of information
| comparing these different architectures that I could find and I
| would love to be able to update this post with more details.
| Sardtok wrote:
| There's the Seven Concurrency Models in Seven Weeks book. It's
| a bit old, and not so much about frameworks, but more about
| programming languages and their available threading models or
| alternatives to pure threads. It still covers a lot of the same
| theory, but not distributed, and a bit lower level.
|
| https://pragprog.com/titles/pb7con/seven-concurrency-models-...
| skadamat wrote:
| besides Ray, are there other tools or platforms that embrace
| the agent actor architecture model?
| Sardtok wrote:
| Akka uses the actor model.
| https://doc.akka.io/docs/akka/current/typed/actors.html
|
| IIRC, Akka based their implementation on Erlang (possibly
| OTP), but it's been a long time since I did anything with
| either Erlang or Scala, and I never used Akka in Java.
| felixgallo wrote:
| https://www.erlang.org/ https://elixir-lang.org/
| photonthug wrote:
| Elixir is amazing, see also stuff like
| https://github.com/bitwalker/libcluster . I'm always
| surprised there's not more excitement generated by the
| possibility of having _language-level features_ that would
| normally require all the weight of something like k8s. I
| don 't get the impression that many people are seriously
| using it for data-engineering work though, which seems like
| a shame
| Nezteb wrote:
| Elixir definitely!
|
| If you need to use Go, there is also ergo:
| https://github.com/ergo-services/ergo
|
| Yet another cool tool is lunatic:
| https://github.com/lunatic-solutions/lunatic
| photonthug wrote:
| Ok, I'll bite. I've always thought that agent-oriented / reactor
| / proactor type patterns are elegant and yet under-utilized and
| under-appreciated. If I had to guess about why.. maybe these
| systems scale well in terms of traffic but do not tend to scale
| well in terms of implementation at most orgs?
|
| When the average dev team finally gets the simple queue they
| asked ops for, the thing is probably late and only present in 2/3
| of dev/qa/prod (or some similar SNAFU). Flink works with AWS EMR,
| so we can't explain everything by just considering whether hosted
| services are available, but admin/setup/general conceptual
| overhead is on a different level. Partly because they've been
| burned in the past with stuff like this, and partly because they
| are lazy.. devs mostly want simple infrastructure mirroring
| simple data-structures where they can test/develop discrete code
| locally without thinking about system-level stuff.
|
| Consider the problem of a) developing a new agent in an actor
| system vs b) developing a new "step" in a simpler batch-oriented
| pipeline. For (a), to actually run my code experimentally locally
| I need to simulate the appropriate message(s) locally and maybe
| other aspects of system-runtime. Whereas for (b) one digs up
| appropriate input and starts hacking on a docker-container that
| one trusts can be jammed into an airflow DAG later. Both
| approaches need a bit of dev/test harness kinda setup, but (a) is
| potentially more involved and more importantly it usually crosses
| team-boundaries (requiring both devops and devs). This will
| probably create pain unless the org has a talented "platform
| team" already in place.
|
| Another aspect involving team boundaries is that besides dev
| teams disliking ops/infra, they also don't trust each other! So
| after effort is sunk into things like the "smarter-queue" that
| might support messaging and actors making actors, it turns out
| every team wants their own queues, routing, codebases, underlying
| storage, etc. For better or worse, attempting to provide features
| along the lines of flexibility/interoperability are thus
| undermined. Out of necessity, teams often want to provide some
| limited access to _data_ which other teams can consume. But they
| resist providing any kind of access to code APIs / runtime.
|
| Typical scenario: Ever work at an org that's supposed to have a
| data-lake, but every single new app or new feature inside an
| existing app generates requests for new buckets that literally
| nothing in the existing system can access? Some manager or
| "senior" dev is having a knee-jerk reaction that they want to
| build a kingdom. In the end they won't actually _enjoy_ answering
| access-requests to their walled-garden, but they think they can
| push those over to support requests. With only data as a
| deliverable, they don 't need to think about any interop and,
| bonus, no one will even know if their messy scripts are in
| version control.
| amath wrote:
| describing so many nightmares here :). The overhead of
| testing/running locally is so important and yet so frequently
| not optimized for.
| danielovichdk wrote:
| Tasks and Parallel are available as first class citizens in the
| .NET framework. Large apis. Easy to cold start a good headache.
| It's not easy stuff.
|
| Difficult patterns to optimize for is my experience if any IO or
| network boundaries are in play. Even that and it sometimes seems
| the more knowledge on how threading works on one particular cpu
| architecture.
|
| So for me at least, on a high level these patterns seem easy
| enough and seems to play well in to the popculcural small service
| fad and large data processing. But be aware of the underlying cpu
| architecture and threading and how thread pools work on your
| particular OS.
|
| Oh yes and then comes the debugging and reading the code part,
| which we all know are where the real efforts of time comes to
| play.
|
| Use these when absolutely no other options are available. Just
| like multithreading.
| neonsunset wrote:
| var task1 = service.DispatchFirst(param1);
|
| var task2 = service.DispatchSecond(param2);
|
| var final = service.DispatchThird(await task1, await task2);
|
| or
|
| var queries = users.Select(user => FetchPurchases(user.Id));
|
| var results = await Task.WhenAll(queries);
|
| Very easy.
|
| As long as you don't touch System.Threading.Tasks.Dataflow
| namespace, everything will be good.
| victor106 wrote:
| Anyone know of these are available in the Java world?
| Fellshard wrote:
| When I used those libraries in .NET, they simply did not
| function, for unspecified reasons. I boiled it down to the
| smallest possible example, based on their docs, and data simply
| never made it through the flow at all. It was baffling to me,
| but not an uncommon experience with various components of the
| core of .NET (not to be confused with .NET Core).
| CyberDildonics wrote:
| "data parallel" and "task parallel" are the same thing. You work
| out data dependencies ahead of time and a thread does something
| specific by reading the data it needs. Whether you do that by
| giving a bunch of threads the same data and then giving them a
| range they need to work on or give them different chunks of data,
| it boils down to the same mechanic of having data ahead of time
| and executing something specific, dictated by a different thread
| and using data they don't own.
| wmf wrote:
| At a high enough level I guess everything is just computation
| but using GPGPU as an example there are programs where control
| flow diverges and ones where it doesn't.
___________________________________________________________________
(page generated 2023-07-18 23:00 UTC)