Post ADPxDEwWjsdJU2jv3g by urusan@fosstodon.org
(DIR) More posts by urusan@fosstodon.org
(DIR) Post #ADPptsEHvIas5UObYm by alexandra@mk.nixnet.social
2021-11-15T04:57:56.535Z
0 likes, 0 repeats
Spend some time and learn the assembly for your architecture, usually they're not as bad as the reputation. It's handy, not really because you'll be programming in it but more to understand the machine better, and for debugging.
(DIR) Post #ADPptslFwixtjjovwm by savanni@anarchism.space
2021-11-15T05:25:20Z
0 likes, 0 repeats
@alexandra I'd actually like some good instructions for writing assembly for embedding in C code.I wrote something a while back and had a very rough time dealing with all the register replacement syntax. Then when i asked for help, folks just said "don't write assembly, the compiler will do it better".
(DIR) Post #ADPpttIDy9KvNzFGKm by urusan@fosstodon.org
2021-11-15T05:49:19Z
0 likes, 0 repeats
@savanni @alexandra It's true, the compiler knows every ridiculous little trick to make things more efficient.Though I can see why that response would be aggravating for someone who wants to learn assembly to deepen their knowledge of computers.
(DIR) Post #ADPpzgw0pVTUnLcbh2 by xarvos@nixnet.social
2021-11-15T05:13:37.599549Z
1 likes, 0 repeats
@alexandra > AMD64 Architecture Programmer’s Manual> 3 volumes> ca. 1700 pages:akko_scream:
(DIR) Post #ADPq460jXUSlj2vjSC by alexandra@mk.nixnet.social
2021-11-15T05:28:53.681Z
0 likes, 0 repeats
@savanni@anarchism.space ah yes, the stack overflow approach to answering questions, "you are wrong to want to do this!!!!"Every time I do inline assembly I have to reference the manual and also guess until it works so I'm no help. :blobcatgooglyshrug:
(DIR) Post #ADPq46T5q39F90CNeq by savanni@anarchism.space
2021-11-15T05:30:17Z
1 likes, 0 repeats
@alexandra the GCC manual for inline assembly was total rubbish.
(DIR) Post #ADPqC6QlrU0NLQvWts by icedquinn@blob.cat
2021-11-15T05:52:42.095888Z
0 likes, 0 repeats
@urusan @savanni @alexandra risc platforms seriously aren't designed to be written for by mortals though.
(DIR) Post #ADPqdIM7dI4u9FicjI by alexandra@mk.nixnet.social
2021-11-15T05:56:13.093Z
0 likes, 0 repeats
@icedquinn@blob.cat @urusan@fosstodon.org @savanni@anarchism.space They're designed as good compilation targets (as they should be), so yeah it gets at best extremely tedious and weird and at worst utterly incomprehensible if you try to write anything substantial in them.I don't mean to suggest anybody should write large programs in assembly nowadays.
(DIR) Post #ADPqdIwzQDZJzay4C8 by icedquinn@blob.cat
2021-11-15T05:57:36.290669Z
0 likes, 0 repeats
@alexandra @urusan @savanni i've been poking around this space a little because people still have to interact with this stuff for SIMD operations. :blobcatsweats2:
(DIR) Post #ADPqrbRnRoenPiCBDk by alexandra@mk.nixnet.social
2021-11-15T05:58:23.824Z
0 likes, 0 repeats
@icedquinn@blob.cat @urusan@fosstodon.org @savanni@anarchism.space somebody please add decent SIMD abstractions to C
(DIR) Post #ADPqrcRpjAHSW7Diuu by icedquinn@blob.cat
2021-11-15T06:00:11.453174Z
0 likes, 0 repeats
@alexandra @urusan @savanni there are gcc intrinsics but those are still pants.tbh i'm just gonna write the formulas in rhombus and try to figure out how to compile math expressions to simd :blobcatsweats:
(DIR) Post #ADPr4T9dakHzSTpwhs by urusan@fosstodon.org
2021-11-15T06:02:29Z
0 likes, 0 repeats
@alexandra @icedquinn @savanni While thinking about the learning angle, I was wondering if writing in LLVM IR might be a good compromise.You can also have LLVM spit out the (optimized) native assembly to see what the compiler would do.In Julia you can use the @code_native and @code_llvm macros to see the assembly code Julia produces for a given line of Julia code.julia> @code_native 1 + 3.text; ┌ @ int.jl:87 within `+'leaq(%rdi,%rsi), %raxretqnopw%cs:(%rax,%rax)nop; └
(DIR) Post #ADPrB9DA8d6HdIfbg8 by urusan@fosstodon.org
2021-11-15T06:03:41Z
0 likes, 0 repeats
@alexandra @icedquinn @savanni Here's the LLVM for 1+3:julia> @code_llvm 1 + 3; @ int.jl:87 within `+'define i64 @"julia_+_219"(i64 signext %0, i64 signext %1) {top: %2 = add i64 %1, %0 ret i64 %2}
(DIR) Post #ADPrC7Tu3W8sYaxp6O by icedquinn@blob.cat
2021-11-15T06:03:54.091215Z
0 likes, 0 repeats
@urusan @alexandra @savanni > using lea to add two numbersoh ye gods
(DIR) Post #ADPw9Rf7PK2NoqDv7I by alexandra@mk.nixnet.social
2021-11-15T05:50:20.925Z
0 likes, 0 repeats
@urusan@fosstodon.org @savanni@anarchism.space Or somebody who needs a machine instruction that isn't available as an intrinsic, or somebody who needs extremely specific control over instruction timings, or...
(DIR) Post #ADPw9S9FbI8lKIJz5E by savanni@anarchism.space
2021-11-15T06:03:14Z
0 likes, 0 repeats
@alexandra @urusan extremely specific control over instruction timing FOR THE WIN If your most precise C-level sleep function might give you microsecond precision, and i need 100ns precision for a hardware protocol, assembly is what I have to use.But it would have been nice to be able to program a counter with the correct number of NOPs so that function could easily adapt to different clock speeds.
(DIR) Post #ADPw9Se5kcoIrwkc9g by urusan@fosstodon.org
2021-11-15T06:59:24Z
0 likes, 0 repeats
@savanni @alexandra By the way, something like this would apparently would work to produce exactly 5 NOPs in GCC:#pragma GCC push_options#pragma GCC optimize ("O0")register int i = 0;i = i;i = i;i = i;i = i;i = i;#pragma GCC pop_options With a bit more thought, I'm sure this could be turned into a loop, but I'd be concerned about the loop adding overhead. The initial assignment might too.The use of i=i is because this is what x86 does for NOP (xchg).
(DIR) Post #ADPwNm3cNLGj7yW1aa by urusan@fosstodon.org
2021-11-15T07:02:01Z
0 likes, 0 repeats
@savanni @alexandra The pure C version of this is quite ugly though (and compiler-specific). It would be nice to have a good guide for mixing assembly and C like you were suggesting.
(DIR) Post #ADPwUeqFbLIAmAXsXI by icedquinn@blob.cat
2021-11-15T07:03:16.174468Z
0 likes, 0 repeats
@urusan @savanni @alexandra fancy compilers have heuristics for looking at when unrolling a loop is worth it or not. i'm not strictly sure what those heuristics are though.
(DIR) Post #ADPxDDuMaRJAH2ig2y by alexandra@mk.nixnet.social
2021-11-15T07:03:42.325Z
0 likes, 0 repeats
@urusan@fosstodon.org @savanni@anarchism.space NOP itself isn't really that hard__asm__("nop");it gets hard when you need to get data into and out of the assembly code, often for intrinsics.
(DIR) Post #ADPxDEQydBObuByisi by alexandra@mk.nixnet.social
2021-11-15T07:06:02.486Z
0 likes, 0 repeats
@urusan@fosstodon.org @savanni@anarchism.space Or rather the exact opposite, things that aren't intrinsics.
(DIR) Post #ADPxDEwWjsdJU2jv3g by urusan@fosstodon.org
2021-11-15T07:11:17Z
0 likes, 0 repeats
@alexandra @savanni Well, __asm__ is incredibly useful, good to know about.Also, I found some interesting discussion here: https://stackoverflow.com/questions/26456510/what-does-asm-volatile-do-in-cvolatile came up as another recommendation for preventing unwanted compiler optimization in a specific area.