Post Awlf0rRdPi0d3yLgG0 by StompyRobot@mastodon.gamedev.place
(DIR) More posts by StompyRobot@mastodon.gamedev.place
(DIR) Post #AwlSfhbFcfgQzxzUvI by foone@digipres.club
2025-08-02T16:16:52Z
0 likes, 0 repeats
So I need to run some software across several machines: I basically have to process thousands of items, in no particular order, and each item takes 5-10 seconds to process. Is there any existing software that'd be a good fit for orchestrating this task across a variable number of computers? Previously I've done this kind of thing with CI systems like Jenkins, but the fact each "build" is so short that the overhead of a CI system could really add up.
(DIR) Post #AwlSszUKQRkiF75bX6 by jenesuispasgoth@pouet.chapril.org
2025-08-02T16:18:58Z
0 likes, 0 repeats
@foone would a batch scheduler be a good fit? You could add as many jobs as you want, as long as the scheduler knows all the available machines on your network.
(DIR) Post #AwlSu3KLdaz8RGvZ8i by foone@digipres.club
2025-08-02T16:19:10Z
0 likes, 0 repeats
basically I want to be able to define 10,000 processing setups, and that gets partitioned across all available workers, then the results of those runs are collected centrally. The tasks are going to involve running an external program (probably a docker container, for cross-OS support?) so solutions where I have to define my processing in some DSL aren't really good fits.
(DIR) Post #AwlT7FcNFhRgK3eVns by metafnord@chaos.social
2025-08-02T16:21:48Z
0 likes, 0 repeats
@foone sounds a lot like a job for mpi. Years ago I’ve written software in python that distributed simulation jobs to a cluster of workers https://github.com/juliusf/Neurogenesis/blob/master/neurogenesis/cluster.py
(DIR) Post #AwlTJ4lbQFMa8wbjSC by tychotithonus@infosec.exchange
2025-08-02T16:23:54Z
0 likes, 0 repeats
@foone GNU Parallel, hands down. Its lightweight distributed-work syntax is pretty absorbable.
(DIR) Post #AwlTM6ro4S23WcxLE0 by tezoatlipoca@mas.to
2025-08-02T16:24:15Z
0 likes, 0 repeats
@foone ok, as reluctant as I am to push my own software, actually a project of mine might serve well here: https://synk.tezoatlipoca.com/aboutRun this and have each build machine update a semaphore blob in synk. Write a master dispatch/controller script that monitors or updates the semaphore blobs for each remote. Use the remote task machine names as the blob keys. All the remotes & master have to do is put/get from the semaphore like "run 12 success" or "error: foo". https://github.com/tezoatlipoca/synk
(DIR) Post #AwlTPbPywFiF9AbgTw by ghalfacree@mastodon.social
2025-08-02T16:24:45Z
0 likes, 0 repeats
@fooneGNU Parallel can handle this - it SSHes into each machine and runs, by default, as many workers as there are logical processors. It'll also copy files to the remote system and copy the processed files back, if necessary. It's pretty neat.
(DIR) Post #AwlUD3sLvGAbb1SdN2 by momo@social.linux.pizza
2025-08-02T16:34:02Z
0 likes, 0 repeats
@fooneThank you for asking this, because today I learned about GNU Parallel and this is awesome!
(DIR) Post #AwlUKIHKlsLYIEauVU by mcgrew@dice.camp
2025-08-02T16:35:15Z
0 likes, 0 repeats
@foone My first thought is using a queue system (Rabbit, Celery, etc), put the jobs there and have each worker pull a job and send the results back to the queue.But I guess that's not really "existing" software.
(DIR) Post #AwlUNc5PaDDAaubIWW by h0m54r@mastodon.social
2025-08-02T16:35:41Z
0 likes, 0 repeats
@foone I believe GNU Parallel can do this quite effectively if you have passwordless SSH set up to each of the workers, so long as each worker has a compatible software environment.
(DIR) Post #AwlVCeV0N7g2qWuPgm by philpem@digipres.club
2025-08-02T16:45:12Z
0 likes, 0 repeats
@foone For this I'd either use GNU Parallel or if I couldn't get it to connect out, maybe use something like Parallel on each system and a server to orchestrate them. Maybe MPI, dispy or RabbitMQ, but a database and a "gimme a work unit" web API would also work. Effectively you're recreating distributed.net.The tricky bit will be sizing the WUs.
(DIR) Post #AwlVb2JbTYxdf1Ceps by spinglass@c.im
2025-08-02T16:30:49Z
0 likes, 0 repeats
@foone I recently did something like this: convert 100,000s of DNGs into JPEG by invoking darktable across linux and macOS hosts. I did it using https://dispy.org/.Bit of a learning curve to the using the library, but once done I was able to bring a new node online by (a) installing darktable (b) installing python + dispy (c) starting the compute node.
(DIR) Post #AwlVq5scVKLL3nlIRs by deegeese@noc.social
2025-08-02T16:52:12Z
0 likes, 0 repeats
@foone Hadoop MapReduce is probably overkill for this, but would allow you to abstract the job from the compute infrastructure, offering retries and job status tracking.
(DIR) Post #AwlWYEShxQtSQ1gkYC by epithumia@mstdn.social
2025-08-02T17:00:06Z
0 likes, 0 repeats
@foone Torque or Slurm (https://slurm.schedmd.com/quickstart.html) maybe?
(DIR) Post #AwlXCZLPjk6qleVjuK by krono@toot.berlin
2025-08-02T17:07:33Z
0 likes, 0 repeats
@foone For Benchmarks I have used ReBench (https://github.com/smarr/ReBench/)But a tiiiiiny slurm cluster wolud probably also work (even though a bit overkill, but it can deal with heterogeneous "member" machines)
(DIR) Post #AwlY7rhY9sLuCuBZYG by suetanvil@freeradical.zone
2025-08-02T17:17:51Z
0 likes, 0 repeats
@foone Would it be possible to break the items into batches to process sequentially on a node? That would increase processing time a bit but amortize the overhead across more items.
(DIR) Post #Awlf0rRdPi0d3yLgG0 by StompyRobot@mastodon.gamedev.place
2025-08-02T18:35:09Z
0 likes, 0 repeats
@foone Depending on scale:Shell for loop with fork and ssh.Clush.Ray.