Post 2779311 by alcinnz@floss.social
(DIR) More posts by alcinnz@floss.social
(DIR) Post #2775969 by alcinnz@floss.social
2019-01-08T18:11:46Z
0 likes, 1 repeats
Please humor me as I theorize about an alternative CPU architecture.That hopes to be simpler than what we've got whilst doing a better job at extracting parallelism and protecting sandboxes. Because I don't think imperative programming was the correct paradigm for machine code.Functional would've been better.
(DIR) Post #2776092 by uranther@cybre.space
2019-01-08T18:15:47Z
0 likes, 1 repeats
@alcinnz Seems like we should reverse engineer the Mill CPU architecture either without violating their patents or by somehow protecting it from patent litigation. :blobshrug:
(DIR) Post #2776507 by alcinnz@floss.social
2019-01-08T18:35:18Z
0 likes, 0 repeats
Let's start with a data model. I'd store all data as consisting of:* a callback function* a refcount* a fixed number of callback-specific fields.The only data and operations that can be processed would be stored in those fields, or compiled into the callback.If more data is needed more "thunks" can be used. And if less is needed, you don't have to use all those fields.
(DIR) Post #2776896 by vertigo@mastodon.social
2019-01-08T18:50:48Z
0 likes, 1 repeats
@uranther @alcinnz Problem is, at the gate level, everything is either combinatorial logic or state. And state is what enables us to multiplex many computations on limited hardware.I think @uranther has it right -- the best opportunity for new computing architectures will come from something approximating (or equalling) a Mill architecture processor.
(DIR) Post #2777514 by alcinnz@floss.social
2019-01-08T19:16:01Z
0 likes, 0 repeats
I'd further split the memory up so each CPU core only has access to it's own smallish section (thereby decreasing pointer sizes).Though they'd also be able to address a few thunks from their "neighboring" cores and the RAM in order to store data there when they fill up, and request the computed data later. The RAM might require the data to be precomputed, and might compact the data for storage.
(DIR) Post #2778168 by alcinnz@floss.social
2019-01-08T19:41:47Z
0 likes, 0 repeats
So each core would track two thunks it's allowed to directly access data from: one for variables, another for arguments.And they'd have operations for:* allocating a new thunk, filling it with data, and storing it in a variable slot.* tail-recursing to one of the accessible thunks or a compiled-in function.* pushing a new context with a branch table* popping the current context whilst executing one of those branchesIt might be easiest to lower these to smaller micro ops.
(DIR) Post #2778546 by enkiv2@eldritch.cafe
2019-01-08T19:48:10Z
0 likes, 1 repeats
@vertigo @uranther @alcinnzI think we might be able to do something interesting with hardware that does planner/constraint solver type stuff. That implies functional style. QC seems to be oriented toward this already, just because superposition collapse is a solid metaphor for unification & vice versa.
(DIR) Post #2778793 by alcinnz@floss.social
2019-01-08T20:08:18Z
0 likes, 0 repeats
Between threads, I'd allow tasks chosen by a the software or a JIT (or upon stack overflow) to be offloaded onto another core, and for a core to wait for that computation to complete.Maybe it'd queue up work (represented as always as thunks) to do in the meantime. It could even invent busywork by computing whatever data it has laying around.But the biggest opportunity I see to drive parallelism is from the output circuitry, what I've described wouldn't have computed enough for it.
(DIR) Post #2778826 by kragen@nerdculture.de
2019-01-08T20:09:59Z
0 likes, 0 repeats
@alcinnz You should check out the late-80s/early-90s work on dataflow machines, by Arvind and others. Also maybe TRIPS/EDGE more recently
(DIR) Post #2778846 by alcinnz@floss.social
2019-01-08T20:11:04Z
0 likes, 0 repeats
@kragen Thanks!
(DIR) Post #2779063 by alcinnz@floss.social
2019-01-08T20:20:07Z
0 likes, 0 repeats
To handle output it'd need to have circuitry that asks for each field to be computed, before serializing it into an acceptable format. Though if one of those fields represents *when* this output needs to be delivered, this circuit needs to be able to handle that.Maybe the machine code would be compiled in this way.And for input I'd need a circuit that deserializes and time stamps external data, whilst having a special response for not yet received data.
(DIR) Post #2779311 by alcinnz@floss.social
2019-01-08T20:29:59Z
0 likes, 0 repeats
Finally, as for math that'd benefit from a different form of a parallelism. So I'd give software a pseudo-function that sends math operators to a special circuit. And if multiple formulas are sent before the previous finishes I'd have it send them back to the main processor for optimization/merging.Because this maths processor would essentially be a large SIMD circuit mostly yielding comparison results to the main CPU.Fin.