Post AIjWSuKNJrNgPga3hg by mburakov@mastodon.social
 (DIR) More posts by mburakov@mastodon.social
 (DIR) Post #AIiatSdGbSL8zlNGAy by wolf480pl@mstdn.io
       2022-04-22T20:21:14Z
       
       0 likes, 0 repeats
       
       ok who the hell invented ascii?So letters are 0x41-0x5A upper case, or-ing a 0x20 gets you lowercase 0x60-0x7A. So upper-lower case conversion is just one bit-flip, neat, right?And then you have 0x5B-0x5E [\]^  and their lower case is {|}~And then there's 0x5F _, and the lower case of that is... 0x7F DEL which is not printable ffs...Like, I know it has to be there because it's easier to add holes on a punch card than to remove them, but it's still annoying.1/
       
 (DIR) Post #AIibTcZv6LouGd3FE8 by wolf480pl@mstdn.io
       2022-04-22T20:27:22Z
       
       0 likes, 0 repeats
       
       But ok this part is understandable, and the square brackets line up with curly braces and stuff.Ok, let's look at curly braces with bit 0x40 cleared0x7B-0x7E {|}~0x3B-0x3E ;<=>off by onewhy?2/
       
 (DIR) Post #AIibbNz3FQizJrVbg8 by wolf480pl@mstdn.io
       2022-04-22T20:29:11Z
       
       0 likes, 0 repeats
       
       Ok, let's look at the 0x20 area, below digits.0x20-0x2F: !"#$%&'()*+,-./I'm not even complaining that parens not being a single bitflip from <=>.But look at characters typically used in pairs, like  " ' () vs those which usually appear alone like !#$%&*+,-./would be nice to be able to quickly test if it's one or the other, like with a bitmask or a range, [!-/] would make a nice regex for example.No, they're just intermixed randomly.And yeah diff. braces are all over so no (-} either
       
 (DIR) Post #AIibhrnbJAQVWlF8ds by michal@toot.kottman.xyz
       2022-04-22T20:30:17Z
       
       0 likes, 2 repeats
       
       @wolf480pl "why?" - History. Veeery long history.https://en.m.wikipedia.org/wiki/ASCII#History
       
 (DIR) Post #AIibjymkIwa3m2qnzs by wolf480pl@mstdn.io
       2022-04-22T20:30:45Z
       
       1 likes, 0 repeats
       
       inb4 someone comes and says EBCDIC is much more ordered and I should totally check it out.Anyway, we're stuck with ASCII and replacing it with a different encoding now would be insane so yeah... thanks for coming to my TED talk
       
 (DIR) Post #AIidhTLERNY5a8uhSC by Roujo@toepi.moe
       2022-04-22T20:47:24.223980Z
       
       0 likes, 0 repeats
       
       > 0x20-0x2F: !”#$%&’()*+,-./@wolf480pl huh, that kinda matches with the shifted numbers row on the keyboard layouts I know about, but it’s different enough that I don’t know if that’s a coincidence or not :o
       
 (DIR) Post #AIidhUBLJdEUBlIJg8 by wolf480pl@mstdn.io
       2022-04-22T20:52:28Z
       
       0 likes, 0 repeats
       
       @Roujo it was designed to match the shifted numbers on some typewriters, except you couldn't define ) as shift+0, because 0 is 0x30, and shifted 0x30 is 0x20 which is already a space, so they removed shift+6 = _ and instead moved all the specials by 1 digit to the left ugh....
       
 (DIR) Post #AIidhUpktNYiD6CafY by wolf480pl@mstdn.io
       2022-04-22T20:52:36Z
       
       0 likes, 0 repeats
       
       @Roujo *half of the specials
       
 (DIR) Post #AIidkntOUNHEBkTtIm by wolf480pl@mstdn.io
       2022-04-22T20:53:18Z
       
       0 likes, 0 repeats
       
       @Roujo it's like, they're trying to be regular, but when you get used to the patterns they stab you in the back
       
 (DIR) Post #AIjWSuKNJrNgPga3hg by mburakov@mastodon.social
       2022-04-23T07:06:18Z
       
       0 likes, 0 repeats
       
       @wolf480pl May I ask, what are you trying to achieve? Thankfully there are already things like isalnum, isdigit and others. AFAIR these are implemented as macros and/or lookup tables at least in glibc, so it’s unlikely you will have any noticeable performance gain from a custom implementation. Some basic manipulations like tolower and toupper are also there.
       
 (DIR) Post #AIjWgq4dvoAdH5rh8S by wolf480pl@mstdn.io
       2022-04-23T07:08:51Z
       
       0 likes, 0 repeats
       
       @mburakov I'm trying to decide which characters should be allowed in an address in a chat protocol I'm designing.Also, I'm not in C, and even if I was, relying on libc for this would be probably a bad idea since libc uses locale and locale operates on characters not bytes.
       
 (DIR) Post #AIjXR90frNtTKB5Z2m by mburakov@mastodon.social
       2022-04-23T07:17:12Z
       
       0 likes, 0 repeats
       
       @wolf480pl AFAIU you want these as visible chars? Just a random suggestion without knowing any details - base64?
       
 (DIR) Post #AIjXiPyw0ns6svVnWa by wolf480pl@mstdn.io
       2022-04-23T07:20:17Z
       
       0 likes, 0 repeats
       
       @mburakov no no it's meant to be user-friendly.Like an email address or XMPP JID.So definitely a-zA-Z0-9, and then some special chars, but not all, since some of them are already reserved for other stuff.So I was trying to figure out if there is some natural grouping in ASCII I can exploit, that'd let me exclude all special characters I don't want and possibly a few more with a single easy-to-implement rule. Apparently not.
       
 (DIR) Post #AIjYD73KRL7F9dCg4m by lanodan@queer.hacktivis.me
       2022-04-23T07:25:53.931692Z
       
       1 likes, 0 repeats
       
       @mburakov @wolf480pl You absolutely do not need lookup tables for isalnum/isdigit/…https://drewdevault.com/2020/09/25/A-story-of-two-libcs.html
       
 (DIR) Post #AIjZkjVIbHnrv1siga by mburakov@mastodon.social
       2022-04-23T07:33:37Z
       
       0 likes, 0 repeats
       
       @lanodan @wolf480pl Yes, yes. I know Drew does not like glibc :) But let’s be honest, glibc is way more performant than musl. And it’s exactly because of things like that. Note, that I also prefer musl, and always make sure that my code builds fine for it.
       
 (DIR) Post #AIjZkk6sLZrRnZSjFw by lanodan@queer.hacktivis.me
       2022-04-23T07:43:10.214313Z
       
       1 likes, 0 repeats
       
       @mburakov @wolf480pl Performance into a wall is a terrible design, that said there is good implementation of the lookup table: AT&T Unix and Plan9.GNU is plagiarism with buggy obfuscation and additions layered on top.
       
 (DIR) Post #AIjZsirkr6t0G1C8uW by wolf480pl@mstdn.io
       2022-04-23T07:44:34Z
       
       2 likes, 0 repeats
       
       @mburakov @lanodan glibc on the topcopy-pasted musl implementation on the bottonhttps://godbolt.org/z/od6TfscGrtell me how glibc's call and two loads are supposed to be faster than a couple of subs and cmps, not to mention the stuff that's happening inside __ctype_b_loc
       
 (DIR) Post #AIjayfrl1RBAe2b8KG by wolf480pl@mstdn.io
       2022-04-23T07:56:54Z
       
       0 likes, 0 repeats
       
       @mburakov @lanodan well ok, __ctype_b_loc is justmov rax, qword [0x001f9d58]add rax, qword fs:[0]retwhich is not terrible but it's another 2 loads and an add.
       
 (DIR) Post #AIjdPWUmAbBXJEmLWi by lanodan@queer.hacktivis.me
       2022-04-23T08:23:59.869040Z
       
       0 likes, 0 repeats
       
       @wolf480pl @mburakov Actual comparison would be with loading the libc, with actual musl you get at least the overhead of few functions jumps+returns compared to your example since <ctype.h> on musl only contains function prototypes.For glibc I'm not entirely sure how I could do since static compiling isn't there so a simple objdump isn't going to help, I guess I would have to look in lldb/gdb.
       
 (DIR) Post #AIjdq7OlaVBXI7eKbA by wolf480pl@mstdn.io
       2022-04-23T08:28:58Z
       
       0 likes, 0 repeats
       
       @lanodan @mburakov I've been static-compiling with modern glibcs successfully...It still spits out an ET_DYN for the sake ASLR, but doesn't depend on any dynamic libraries.Also, I woudn't measure performance in cold code, since if you're calling isalnum only once, it doesn't matter anyway.And in hot code the library would be already loaded, the dynamic link would be bound, and ld.so would stay out of your way.
       
 (DIR) Post #AIjnGKjNXWYbl2rjxg by mburakov@mastodon.social
       2022-04-23T10:14:29Z
       
       0 likes, 0 repeats
       
       @wolf480pl @lanodan [mburakov@de-c-299 test]$ time dd if=/dev/random bs=4096 count=1M | ./musl1048576+0 records in1048576+0 records out4294967296 bytes (4,3 GB, 4,0 GiB) copied, 13,7753 s, 312 MB/sreal    0m13,782suser    0m12,872ssys     0m11,645s[mburakov@de-c-299 test]$ time dd if=/dev/random bs=4096 count=1M | ./glibc1048576+0 records in1048576+0 records out4294967296 bytes (4,3 GB, 4,0 GiB) copied, 10,5202 s, 408 MB/sreal    0m10,527suser    0m4,016ssys     0m11,984s
       
 (DIR) Post #AIjo4ky0GlfbbdkYkq by mburakov@mastodon.social
       2022-04-23T10:16:20Z
       
       0 likes, 0 repeats
       
       @wolf480pl @lanodan#include <ctype.h>#include <sys/types.h>#include <unistd.h>static unsigned char buffer[4096];int main(int argc, char* argv[]) {  for (int counter = 0;;) {    ssize_t result = read(STDIN_FILENO, buffer, sizeof(buffer));    switch (result) {      case -1:        return -1;      case 0:        return counter;      default:        break;    }    for (size_t i = 0; i < (size_t)result; i++) {      if (isalnum(buffer[i])) counter++;    }  }}
       
 (DIR) Post #AIjo4ldplF89hNJxxI by mburakov@mastodon.social
       2022-04-23T10:16:54Z
       
       0 likes, 0 repeats
       
       @wolf480pl @lanodanAlso, musl - static linkage, glibc - dynamic linkage.
       
 (DIR) Post #AIjo4mKjBlRRqPODoW by wolf480pl@mstdn.io
       2022-04-23T10:23:32Z
       
       0 likes, 0 repeats
       
       @mburakov @lanodan /dev/random? that's gonna be the bottleneck here, check if you're not running out of entropy
       
 (DIR) Post #AIjolflxSQ9dwra0v2 by mburakov@mastodon.social
       2022-04-23T10:31:02Z
       
       0 likes, 0 repeats
       
       @wolf480pl @lanodan This is to make sure nothing is cached. Sure this is a bottleneck, but this does not matter since the code is the same in both cases. The delta of ~3 seconds is persisten during multiple runs.
       
 (DIR) Post #AIjomIJy8PwHVLmiCu by wolf480pl@mstdn.io
       2022-04-23T10:31:33Z
       
       0 likes, 0 repeats
       
       @mburakov @lanodan interesting
       
 (DIR) Post #AIkIrNFaETiyri4V28 by mburakov@mastodon.social
       2022-04-23T10:52:53Z
       
       0 likes, 0 repeats
       
       @lanodan @wolf480pl I kind of agree with that. In general I would also prefer clean code over lightning-fast but ugly-hacky code. But there are basic things in system that just have to work fast. Strings comparison, memory copy, etc. If I pay for performant hardware, I want it to perform, not crawl.
       
 (DIR) Post #AIkIrNpk42eEfqzNOS by lanodan@queer.hacktivis.me
       2022-04-23T16:08:36.086484Z
       
       0 likes, 0 repeats
       
       @mburakov @wolf480pl Well, it could be interesting to try to get more performance out of musl, and bits of Unix/Plan9 could be copied as it got released as MIT since few years, same license as most of musl.That said in terms of execution performance, I would be surprised if glibc vs. musl would change that much in real life, usually there is so many layers on top of it that the libc isn't used as much as we would think, specially as you easily end up having to replace bits of it.At least for compiling performance, using musl makes it noticeably faster than glibc and in terms of reliability musl is much better, glibc tends to throw errors under the rug.glibc usually doesn't passes write errors, things like puts/fputc/… might be faster but you risk loosing data because they're always successful.
       
 (DIR) Post #AIkJ1Q0Cr6bGIQfW3k by wolf480pl@mstdn.io
       2022-04-23T16:10:24Z
       
       0 likes, 0 repeats
       
       @lanodan @mburakov who uses libc IO anyway?
       
 (DIR) Post #AIkJGimBKyGErkPDSC by lanodan@queer.hacktivis.me
       2022-04-23T16:13:11.332426Z
       
       0 likes, 0 repeats
       
       @wolf480pl @mburakov Apparently a lot of things: http://codesearch.debian.net/search?q=stdio.h&literal=1
       
 (DIR) Post #AIkJMRSGeMGYsmopCS by wolf480pl@mstdn.io
       2022-04-23T16:14:13Z
       
       0 likes, 0 repeats
       
       @lanodan @mburakov stderr doesn't count
       
 (DIR) Post #AIkJfvigLnAWFVmzUu by mburakov@mastodon.social
       2022-04-23T16:17:44Z
       
       0 likes, 0 repeats
       
       @wolf480pl @lanodan Hello world samples I guess :) Jokes aside I find it exceptionally convenient for synchronous processing of poorly structured (as in free text) data streams. It is kind of expected, because this is what it was designed for. And also it is pretty useful for… Ehh, no, that’s actually it :)
       
 (DIR) Post #AIkK2ekVc2piHCTqSm by lanodan@queer.hacktivis.me
       2022-04-23T16:21:51.213829Z
       
       0 likes, 0 repeats
       
       @wolf480pl @mburakov I picked stdio.h because there is a huge pile of functions.See http://codesearch.debian.net/search?q=%5Cb%5Bf%7Cs%7Cd%7Cv%5D%3Fn%3F%28put%7Cscan%7Cprint%7Cget%29%5Bscw%5D%28_unlocked%29%3F%5C%28&literal=0 I guess.