[HN Gopher] Tony Hawk's Pro Strcpy
       ___________________________________________________________________
        
       Tony Hawk's Pro Strcpy
        
       Author : ndiddy
       Score  : 672 points
       Date   : 2024-08-07 16:48 UTC (1 days ago)
        
 (HTM) web link (icode4.coffee)
 (TXT) w3m dump (icode4.coffee)
        
       | jonhohle wrote:
       | This is awesome!
       | 
       | I've been doing some PSX decompiling and there are lots of
       | similar things there as well. Interestingly, something like
       | `memmove` is linked in using an SDK library[0], but `strcpy` is a
       | function provided by the BIOS. Later version of the SDK could
       | patch that out for a library version, but as late as 1997 it
       | hadn't been.
       | 
       | 0 - https://github.com/Xeeynamo/sotn-
       | decomp/blob/master/src/main...
        
         | anthk wrote:
         | I'd love a reimplemeantion in C+SDL2 (and OpenGL 2.1) of the
         | former console games.
         | 
         | Now there are the N64 games being ported to PC with
         | decompilers, I can only hope. Inb4 "there are native PC
         | versions of these, you know"... most recompiled N64 games with
         | the FX's being 'deshaderized' to pure textures (or simpler
         | FX's) can be run in toasters such as cheap netbooks from 2009
         | and nearly anywhere.
         | 
         | They even ported Super Mario 64 to the 3DFX API. I know, the
         | most complex games accesing the N64 framebuffer with complex FX
         | will require OpenGL 3.3 to mimic that microcode; but, as I said
         | before, when the engines run uber-fast on anything post Pentium
         | III, is not something difficult to 'mimic' these in software
         | while the rest it's running GL 2.1 accelerated.
        
           | bitwize wrote:
           | > They even ported Super Mario 64 to the 3DFX API.
           | 
           | That's... not surprising. UltraHLE ran SM64 like a dream, and
           | the HLE bit referred to the fact that the emulator translated
           | 3D calls to the Glide API rather than attempting to emulate
           | the 3D hardware directly.
        
             | anthk wrote:
             | Yeah, I knew about that, so this it's just transcribing
             | instead of translating. But I'd guess SGI machines (IRIX)
             | being OpenGL bound (they invented it) the N64 would map the
             | microcode to GL funcs much better.
        
               | bitwize wrote:
               | Glide was modelled after OpenGL, so I'm guessing the
               | mapping was not that much of a stretch anyway.
        
               | rvnx wrote:
               | and then you had to use an extra DLL that was essentially
               | translating 3dfx calls to DirectX
        
               | anthk wrote:
               | That was later. NGlide for sure.
        
           | astrange wrote:
           | The N64 used a different texture interpolation method than
           | anything else ever has (IIRC three-point instead of four-
           | point) so if you do the equivalent of HLE like that, it'll
           | look bad and blurry. Of course, official rereleases of N64
           | games haven't emulated it properly either.
        
             | anthk wrote:
             | Not an issue at 640x480 and higher resolutions.
             | 
             | Once your run the games at 1024x768 and up that doesn't
             | matter at all:
             | 
             | sm64ex, perfectdark, zeldaoot for PC...
        
       | perihelions wrote:
       | - _" If I was lucky it would be strcpy (opposed to something like
       | strncpy)"_
       | 
       | it really ought to have been strncpy, I'm sure Tony Hawk who's
       | lauded for his advocacy of safety gear would prefer to be
       | associated with safer string copying
        
         | kragen wrote:
         | strncpy is definitely not safer; it produces unterminated
         | strings when it hits _n_
         | 
         | basically you should almost never use strncpy; it's
         | specifically for fixed-size fields like this:
         | struct dirent { unsigned short inode; char name[14]; };
         | 
         | and in those cases more often than not the pad byte should be a
         | space rather than a nul
         | 
         | strncpy should never have been added to the standard library
        
           | sidewndr46 wrote:
           | What is the preferred solution here? I usually just use
           | "memset" to zeroize the whole destination string, then tell
           | "strncpy" that my destination is 1 byte shorter than what it
           | really is.
           | 
           | The real issue I've ran into is that "strncpy" assumes the
           | source is null-terminated.
        
             | connicpu wrote:
             | C11 adds `strcpy_s` which takes (dest, destsz, src) and
             | returns an errno_t which will report an error if the src
             | string is longer than destsz, as silent truncation is often
             | not a desirable behavior. It also assigns dest[0]='\0' on
             | error so you don't get an unterminated garbage string.
        
               | david2ndaccount wrote:
               | Only msvc provides strcpy_s and they don't conform to the
               | standard. Other libcs don't provide it. Ignore everything
               | from Annex K and write your own wrappers around memcpy.
               | You should always know the size of your buffers.
        
               | connicpu wrote:
               | Ah that sucks. Guess C is just stuck like this for the
               | long term. Writing your own functions is still the best
               | advice :'(
        
               | kragen wrote:
               | on the plus side, c is good at writing your own functions
        
             | david2ndaccount wrote:
             | Use memcpy and do the size check yourself beforehand
             | (taking the appropriate action if it doesn't fit). Avoid
             | any function starting with str except for strlen. Prefer
             | pointer+length instead of relying on nul-terminated
             | strings.
        
               | nrclark wrote:
               | You mean strnlen.
        
             | paulryanrogers wrote:
             | strlcpy?
        
             | saagarjha wrote:
             | memccpy, then use the return value to terminate.
        
             | pjmlp wrote:
             | Use a proper C string library like SDS.
             | 
             | Or move up from the 1970's Bell Labs, adopt C++ with the
             | respective compiler switches to have bounds checking
             | enabled for _operator[]()_.
             | 
             | Better yet, use something else instead of one of those two,
             | pick whatver is your fancy.
        
             | Sesse__ wrote:
             | The sanest solution is, surprisingly, snprintf(dst,
             | sizeof(dst), "%s", src).
        
               | kragen wrote:
               | please don't fill your program with fifty zillion string
               | buffers of arbitrarily chosen sizes and then try to
               | separately pass the right size in seventy zillion string-
               | processing function calls. your code will be hard to
               | read, buggy, and probably insecure
        
               | Sesse__ wrote:
               | I agree with that statement, but it has nothing to do
               | with snprintf() versus e.g. strcpy_s(), where you have
               | exactly the same requirement to pass the right size.
               | 
               | (Separately, there's a discussion about how many bytes
               | you are allowed to read from the _source_, but to fix
               | that, you need something like the Linux kernel's
               | strscpy(), which isn't really widely supported in
               | userspace.)
        
               | kragen wrote:
               | i agree
        
             | 1over137 wrote:
             | strlcpy() is my favourite, alas the GNU folks stubbornly
             | refuse to embrace it, last I checked.
        
               | jandrese wrote:
               | strlcpy is still braindamaged. The need to return the
               | length of the source string for compatibility with old
               | code means it suffers from the some of the same issues
               | strncpy did.
        
               | 1over137 wrote:
               | Sure, but strlcpy is better than strcpy and strncpy (for
               | strings). I almost never see code that uses the return
               | value of any of them.
               | 
               | It is a simple refactoring to change strcpy/strncpy to
               | strlcpy and, though it doesn't solve truncation issues,
               | it's a solid improvement by eliminating memory overruns
               | and lack of null termination.
               | 
               | It was added to OpenBSD in 1998 and then in FreeBSD, Mac
               | OS X, Solaris, IRIX, etc. but its adoption was hampered
               | by glibc stubbornly refusing to add it (until 2023
               | apparently).
        
               | jandrese wrote:
               | It is frustrating because while it is better, it is still
               | flawed in an easily fixed way.
               | 
               | What I wish the standard library had:
               | ssize_t strscpy(char* dst, const char* src, ssize_t
               | dsize);
               | 
               | Copies src into dst, stopping when it either reaches a \0
               | byte in src or on copying dsize - 1 bytes, whichever
               | happens first. dst is then null terminated.
               | 
               | If the copy is not truncated the strscpy returns the
               | number of bytes copied. If the copy is truncated dsize is
               | returned.
               | 
               | Returns a negative value and sets errno if either dst or
               | src is NULL or dsize is < 1.
        
               | kragen wrote:
               | the _strlcpy_ paper explains why _strlcpy_ isn 't
               | designed the way you suggest: https://www.usenix.org/lega
               | cy/event/usenix99/full_papers/mil...
               | 
               | they actually started out with your design and then fixed
               | it:
               | 
               | > _The return values started out as the number of
               | characters copied, since this was trivial to get as a
               | side effect of the copy or concatenation. We soon decided
               | that a return value with the same semantics as
               | snprintf()'s was a better choice since it gives the
               | programmer the most flexibility with respect to
               | truncation detection and recovery._
               | 
               | basically they wanted to treat string truncation due to
               | insufficient space as an error condition, so they
               | designed the interface to make it easy to check (code
               | from the paper, with syntax corrections):
               | len = strlcpy(path, homedir, sizeof(path));         if
               | (len >= sizeof(path)) return (ENAMETOOLONG);         len
               | = strlcat(path, "/", sizeof(path));         if (len >=
               | sizeof(path)) return (ENAMETOOLONG);
               | 
               | your proposal does permit such simple error checking for
               | _strscpy_ , although it is marginally less efficient:
               | len = strscpy(path, homedir, sizeof(path));         if
               | (len <= strlen(homedir)) return (ENAMETOOLONG);
               | 
               | but i can't think of anything your corresponding
               | _strscat_ could return to permit a similarly simple
               | check. is there anything?
        
               | jandrese wrote:
               | Basically, it seems like they wanted to facilitate the
               | use case of "if the string is truncated, then realloc()
               | the buffer to make it fit using the value convienently
               | returned from strlcpy()".
               | 
               | (note: the following code is probably buggy with off by
               | one errors and doesn't check return value properly)
               | copied_bytes = strlcpy(dst, src, size);        if (
               | copied_bytes >= size )        {            realloc(dst,
               | copied_bytes);            size = strlcat(dst, src + size,
               | copied_bytes);        }
               | 
               | I can appreciate that sentiment, but I think it was a
               | mistake. That behavior is still possible where it makes
               | sense by using strlen() first, but I'd argue that in most
               | cases if this is possible then strdup() was the better
               | solution all along. Basically they made a tradeoff that
               | makes one relatively uncommon use case easier at the
               | expense of making the function explode in other cases.
               | 
               | You don't need strlen() to check for truncation:
               | if ( strscpy(path, homedir, sizeof(path) == sizeof(path)
               | ) return (ENAMETOOLONG);
               | 
               | strscat() would have a similar syntax. Return values
               | would be the same, including returning the size parameter
               | in the case of truncation, making it easy to check.
               | if ( strscat(path, "/", sizeof(path) == sizeof(path) )
               | return (ENAMETOOLONG);
               | 
               | On a side note I always cringe when I see people using
               | sizeof() on strings in C. I left them in here to make the
               | comparison easier, but I wouldn't normally do it this
               | way. That's a gun pointed directly at your foot when this
               | bit of code gets refactored out to a function and that
               | string degrades to a pointer.
        
               | kragen wrote:
               | i think they wanted to facilitate the use case of 'if the
               | string is truncated, then close the connection and log an
               | error message', or 'if the string is truncated, then
               | return an error code', as in the example code i quoted
               | from the paper
               | 
               | strdup() is not helpful in examples like the one i quoted
               | from the paper, where you are building up a string by
               | concatenating substrings, but something like stralloc is.
               | (see other subthread) the paper recommends the libmib
               | astring functions, which are something like stralloc:
               | http://www.mibsoftware.com/libmib/astring/. they
               | definitely were not recommending that people copy and
               | paste those six lines of code with slight changes every
               | time they wanted to copy a string
               | 
               | i don't agree that it makes the function explode in other
               | use cases. if you're okay with truncation then strlcpy()
               | will silently truncate your strings if you don't check
               | its return value
               | 
               | your strscpy() example has a parse error; i think you
               | meant                  if ( strscpy(path, homedir,
               | sizeof(path)) == sizeof(path) ) return (ENAMETOOLONG);
               | 
               | which leads me to think that you mean that if
               | strlen(homedir) is 12 and sizeof(path) is 13, strscpy
               | copies 12 characters (not counting the nul) and returns
               | 12, not 13, while if strlen(homedir) is 13 in that case,
               | it also copies 12 characters, but returns 13. i agree
               | that that would work; it is so similar to the flawed
               | design rejected in the strlcpy paper that i thought you
               | meant the same thing, but you evidently meant something
               | subtly different. i agree that that design would also
               | work for strscat
               | 
               | at that point, though, it might be better to return -1 or
               | INT_MAX rather than dsize on truncation; you can't use
               | the return value you've specified for anything before you
               | check whether it's equal to dsize or not. (this is also
               | true of strlcpy!) actually you also specified to return a
               | negative value on certain other errors, which means you
               | have to check the return value _twice_ before using it
               | for anything; possibly this was a mistake
               | 
               | i also agree that using sizeof on arrays is a footgun for
               | exactly the reason you say, although in this case the
               | most likely result would be that you'd notice the bug and
               | fix it, since pointers are too short for most strings
        
               | kevin_thibedeau wrote:
               | As discussed a few weeks ago, strlen() + memcpy() is
               | faster than strlcpy() on superscalar platforms with
               | branch prediction. Iterating over the string twice is not
               | a penalty if the alternative hobbles the hardware with
               | more complex code.
               | 
               | https://nrk.neocities.org/articles/cpu-vs-common-sense
        
               | kragen wrote:
               | agreed
        
               | jabl wrote:
               | It's in glibc as of 2.38: https://sourceware.org/git/?p=g
               | libc.git;a=commit;h=454a20c87...
        
               | 1over137 wrote:
               | Cool, didn't know, thanks for sharing. Well, literally 25
               | years after OpenBSD added strlcpy, but better late than
               | never I guess.
        
             | kragen wrote:
             | for general-purpose string handling in software where
             | failure is an option, i like the qmail stralloc approach
             | if (!addrparse(arg)) { err_syntax(); return; }
             | flagbarf = bmfcheck();         seenmail = 1;         if
             | (!stralloc_copys(&rcptto,"")) die_nomem();         if
             | (!stralloc_copys(&mailfrom,addr.s)) die_nomem();         if
             | (!stralloc_0(&mailfrom)) die_nomem();         out("250
             | ok\r\n");
             | 
             | basically you have a struct with buffer-pointer, length,
             | and capacity fields, like a golang slice, and you modify it
             | with a small number of functions which reallocate the
             | buffer if it isn't big enough. the ones you see here are
             | stralloc_copys, which sets the buffer contents to the
             | contents of a nul-terminated string, and stralloc_0, which
             | appends a nul to the buffer. there are also functions for
             | appending an arbitrary byte, for copying one stralloc to
             | another, for copying counted strings into strallocs, and
             | for concatenation, for determining whether one is a prefix
             | of another, etc., but depending on the application, you may
             | or may not need to implement these
             | 
             | the whole stralloc library is 97 lines of k&r c, so
             | reimplementing the part you need for a given program is
             | pretty trivial. it's in the public domain
             | 
             | for most programs, a disadvantage of the particular way
             | that stralloc is implemented in qmail is that you have to
             | check every single copy or concatenation operation for an
             | out-of-memory error, as you see above. this makes your code
             | a lot longer. many applications are better off just
             | aborting inside the memory allocation function if they run
             | out of memory; getting out-of-memory handling correct is
             | very difficult, especially if you don't devote a massive
             | amount of effort to testing out-of-memory conditions
             | (because they won't occur often enough just by chance to
             | test your error-handling path)
             | 
             | (another disadvantage of the way stralloc is implemented is
             | that you probably don't really want to use unsigned int for
             | the two length fields on lp64 platforms)
             | 
             | for some applications you might prefer just using strdup()
             | (or xstrdup()) or non-owning string-view types (a pointer
             | and a length, perhaps into an input file you've mapped into
             | memory), or lisp-style symbol interning (plus some kind of
             | buffer management probably). arena allocation, if you can
             | afford it, makes dynamic memory allocation for strings a
             | much more reasonable thing to do: no risk of a memory leak,
             | fast allocation, instant deallocation. but again some
             | applications do poorly with arena allocation
             | 
             | but please don't fill your program with fifty zillion
             | string buffers of arbitrarily chosen sizes and then try to
             | separately pass the right size in seventy zillion string-
             | processing function calls. your code will be hard to read,
             | buggy, and probably insecure. factor string buffer length
             | handling into a small part of your program so that most of
             | your code never has to think about string buffer lengths
        
               | lelanthran wrote:
               | > for most programs, a disadvantage of the particular way
               | that stralloc is implemented in qmail is that you have to
               | check every single copy or concatenation operation for an
               | out-of-memory error, as you see above. this makes your
               | code a lot longer. many applications are better off just
               | aborting inside the memory allocation function if they
               | run out of memory; getting out-of-memory handling correct
               | is very difficult, especially if you don't devote a
               | massive amount of effort to testing out-of-memory
               | conditions (because they won't occur often enough just by
               | chance to test your error-handling path)
               | 
               | You _could_ do that. Or you could put a field in the
               | struct that stores an error flag. If flag is set, all
               | `stralloc` functions return immediately. When they fail,
               | they set the flag and then return.
               | 
               | This lets you do:
               | stralloc_copys(&rcptto,""));
               | stralloc_copys(&mailfrom,addr.s));
               | stralloc_0(&mailfrom));              if
               | (stralloc_error(rcptto) || stralloc_error(mailfrom)) {
               | die_nomem();         }
               | 
               | I'd go one further and make the error checker function
               | take variable arguments, so that the last line looks like
               | this:                   if (stralloc_error (rcptto,
               | mailfrom, NULL)) {             die_nomem();         }
               | 
               | IME, forgetting to terminate the parameter list with a
               | NULL _almost always_ causes the program to blow up on the
               | very first execution.
               | 
               | > but please don't fill your program with fifty zillion
               | string buffers of arbitrarily chosen sizes and then try
               | to separately pass the right size in seventy zillion
               | string-processing function calls. your code will be hard
               | to read, buggy, and probably insecure. factor string
               | buffer length handling into a small part of your program
               | so that most of your code never has to think about string
               | buffer lengths
               | 
               | I agree, but after years and years of looking at and
               | writing idiomatic _safe_ C code, I am _now_ of the
               | opinion that a string library is, while a better approach
               | to slinging around raw strings, still very much the
               | _wrong_ approach.
               | 
               | Nothing stops the developer from doing _Parse, Don 't
               | Validate!_ in C, and this means that seeing C strings
               | being used anywhere other than at the boundaries to the
               | system evokes my code-smell senses.
        
               | kragen wrote:
               | these are very good ideas; thank you! by coincidence
               | yesterday i was looking at some code i wrote in golang
               | six years ago which uses this same approach to error
               | handling for i/o errors. i wonder if you might be better
               | off putting the error flag in the allocator rather than
               | the individual string objects?
               | 
               | i do think _parse, don 't validate_ is much more
               | difficult in c; c's type system is not strong enough to
               | give you the kinds of soundness guarantees you get from
               | ocaml or haskell. if you forget a type case in a switch,
               | there's not a whole lot you can do to get the compiler to
               | complain about it
               | 
               | the code i quoted above is from qmail-smtpd.c, which is
               | 373 lines of code like the above and contains all of
               | qmail's smtp input logic except for ip_scanbracket, which
               | parses strings like [127.0.0.1] and is shared with dns.c.
               | it's not clear to me that a _parse, don 't validate_
               | approach would consist of much more than just the parser
               | 
               | maybe using a parser generator for all your input and
               | output handling would help? i'm still skeptical that
               | something like a text editor or a macro processor is
               | going to have a large body of code that is free of string
               | handling
        
               | lelanthran wrote:
               | > i do think parse, don't validate is much more difficult
               | in c; c's type system is not strong enough to give you
               | the kinds of soundness guarantees you get from ocaml or
               | haskell. if you forget a type case in a switch, there's
               | not a whole lot you can do to get the compiler to
               | complain about it
               | 
               | Sure, I agree, but we're talking about strings here.
               | Instead of a function taking or returning an email
               | address in a generic string type, it can take or return
               | an email address type.
               | 
               | For example, construction of a value of type `email_t`
               | can take a parameter of raw string. Then any function in
               | the rest of the code that receives an email would receive
               | an `email_t`, not a `char _` or some other generic string
               | type.
               | 
               | > it's not clear to me that a parse, don't validate
               | approach would consist of much more than just the parser
               | 
               | It might often be nothing _but* a parser, such as
               | `email_t`, but it means that no `str _()` function would
               | then be used by the caller - any operation on the
               | `email_t`, if `email_t` is an opaque pointer, would used
               | the `email_type__ ()` functions, because the users of any
               | `email_t` value cannot access, or even see, the fields
               | inside an `email_t` value.
               | 
               | This means that passing an email to a function expecting
               | a name would cause a compiler error.
               | 
               | For the IP address example you mention, that _definitely_
               | should be parsed only once into the quad-byte or quad-
               | quad integer fields.
               | 
               | I mean, I'm looking over my previous projects: every
               | single instance of a string I am using is actually not
               | just "generic string"; there's an associated type with it
               | (name, description, comment, whatever). Making those into
               | different types with their own operations means that the
               | compiler will generate errors if I try to use a
               | `description_t` where a `name_t` is expected.
        
               | kragen wrote:
               | possibly you need to \ your *s
               | 
               | indeed it is the case in qmail that ip_scanbracket
               | populates a struct ip_address. but rcptto, where the
               | destination email address goes, is just a byte buffer in
               | a very simple ad-hoc format which, if i understand
               | correctly, gets written to a pipe; qmail's privilege
               | separation design, which its author to a significant
               | extent came to regret, adds some extra difficulties here
               | by requiring things to run in separate processes
               | 
               | what you read from or write to a pipe is, at that point,
               | necessarily just a generic string. you could write some
               | kind of generic serialization layer, but doing that in c
               | requires a preprocessor, and unmarshaling things in a
               | statically type-safe way really requires compiler support
               | for sum types, which c doesn't have
               | 
               | aside from that, i think it's pretty likely that trying
               | to parse the email address in the qmail-smtpd process
               | would have made the code more bug-prone rather than less
               | so
        
               | lelanthran wrote:
               | > is just a byte buffer in a very simple ad-hoc format
               | which, if i understand correctly, gets written to a pipe;
               | qmail's privilege separation design, which its author to
               | a significant extent came to regret, adds some extra
               | difficulties here by requiring things to run in separate
               | processes
               | 
               | > ...
               | 
               | > what you read from or write to a pipe is, at that
               | point, necessarily just a generic string.
               | 
               | Aren't the reader and writer of the pipe part of the same
               | software package?
               | 
               | If they are, then safe[1] functions for _that_ type make
               | sense:                   bool email_to_bytes(email_t
               | *src, uint8_t **dst, size_t *len);         bool
               | email_from_bytes(email_t *dst, uint8_t *src, size_t len);
               | 
               | This still means that you're only ever passing around
               | `email_t` values, not `char *` values.
               | 
               | On the other hand, if the reader and writer of the pipe
               | are in different packages, then the pipe is the boundary
               | for each of them, and you wouldn't be passing language
               | native types without first serialising to a language
               | independent representation anyway.
               | 
               | [1] By "safe" I mean that they don't overflow and that
               | the actual binary format allows the `from_bytes` function
               | to determine when the input could be malicious.
        
               | kragen wrote:
               | yes, they are part of the same software package. i guess
               | qmail-smtpd does have to parse the email address somewhat
               | in order to match the domain against rcpthosts so it can
               | reject attempts to relay mail? and yeah, that's what the
               | addrparse() call does in the code i quoted--but it just
               | stores the email in _addr_ as a canonicalized string. so
               | it may end up rewriting things like lelanthran@[10.1.2.3]
               | as lelanthran@lelanthran.com, for example, and also has
               | code to strip out explicit smtp source routes
               | (@foo.com:lelanthran@lelanthran.com) which were still in
               | theory required when qmail came out
               | 
               | so when i implied that 'trying to parse the email address
               | in the qmail-smtpd process' was not a thing that was
               | already being done, i was wrong, so plausibly your
               | recommendation is in fact applicable; maybe it would have
               | been better to parse the email address into a struct with
               | user and host fields, then have _email_to_bytes_
               | represent the email  <kragen@gentle.dyn.ml.org> as
               | T6:kragen,17:gentle.dyn.ml.org, instead of as
               | Tkragen@gentle.dyn.ml.org\0. i mean you could generate
               | _email_to_bytes_ with a code generator (an idl compiler)
               | instead of writing it
               | 
               | then you wouldn't have to worry about the possibility
               | that you'd accidentally left an @ in one part or the
               | other--unless you did relay the mail over smtp to some
               | other host, in which case you would have to worry about
               | it anyway
        
           | cobbal wrote:
           | strncpy is fine as long as it's not used in isolation. My
           | preferred pattern (when I want the truncation) is to use it,
           | and then unconditionally set the last byte of the buffer to
           | null. This will always result in a valid C string.
        
         | david2ndaccount wrote:
         | The correct thing to do is to use memcpy and to know the size
         | of both the destination buffer and the source buffer. If the
         | source buffer won't fit, then you need to take an application-
         | specific action (is truncation ok? do you have to abort the
         | whole operation? Do you re-alloc the destination buffer? etc.)
         | strncpy almost always does the wrong thing.
        
           | imron wrote:
           | Agree with the general principle of knowing your buffer
           | sizes, but the issue with memcpy (evidenced over many years
           | with various CVEs) is that someone invariably takes a string
           | length and forgets to plus one, leading to non-null-
           | terminated strings.
        
         | thekevan wrote:
         | I read that he used to drive around and when he saw a
         | skateboarder, he'd yell "do an ollie" and then give them a new
         | helmet.
        
           | dfex wrote:
           | "Do a kick-flip"
        
             | lloeki wrote:
             | That's on Eric Koston though although Tony Hawk did
             | participate.
             | 
             | https://m.youtube.com/watch?v=ob0dI05Xz8s
             | 
             | Koston did not invent the thing but has been a major
             | popularity contributor to it.
             | 
             | https://www.surfertoday.com/skateboarding/why-do-
             | skateboarde...
        
         | StressedDev wrote:
         | If you are doing Windows C/C++ development, you can use the
         | strsafe.h functions (https://learn.microsoft.com/en-
         | us/windows/win32/api/strsafe/). When I wrote C/C++, I found
         | them easier to use than the standard C functions because they
         | handled all of the usual failure cases (buffer too small,
         | integer overflow, etc.). It was also easy to check if there was
         | a failure because all of the functions returned a failure code
         | if something went wrong.
         | 
         | In this case, StringCchCopyW(), or StringCbCopyW() would be a
         | better choice than strcpy.
        
       | nj5rq wrote:
       | Very good article.
        
       | makin wrote:
       | A bit of a shame about the exploit applying to THUG PRO. The mod
       | is played to this day, since the more competitive side of the
       | Tony Hawk franchise has been dead for almost twenty years (with
       | the exception of the THPS1+2 remake, which was but a blip in the
       | scene).
       | 
       | The mod itself is over 10 years old now, and I think the original
       | developers are gone, explaining why no one was interested in
       | fixing it when Ryan reported it. But this means that now the mod
       | is unusable, no one is going to want to risk a full privilege
       | exploit taking over their PC.
       | 
       | Hopefully this article reaches someone who's a bit more
       | interested in patching the mod.
        
         | rlabrecque wrote:
         | I wish I had the time, because it would be fun. Back when I DID
         | have time, I actually got that thug1 source code almost
         | playable on Windows. That source code was only for the console
         | versions, and the code assumed if it was compiling for windows
         | (and not Xbox windows..) it was only for tools, so a lot of
         | pieces worked completely differently.
        
       | auto wrote:
       | I've read so many flavors of this sort of exploit analysis over
       | the years, and if I get to read 100 more I'll be all the happier
       | for it.
       | 
       | Great article!
        
       | Retr0id wrote:
       | > The more interesting thing about the habibi key is that the
       | public key modulus only has a 4 byte difference compared to the
       | Microsoft RSA public key. For reference the MS key is a 2048 bit
       | RSA key. I've asked a few people how this might be possible and
       | the answer I got is "if you change the exponent to something
       | small like 3 you easily factor out a similar key". This should
       | require that the exponent of the public key is also patched to
       | "3". However, none of the shell code payloads that use the habibi
       | key ever change the exponent used by the RSA signature
       | verification routine. Presumably it's still performing the
       | validation using the exponent 65537 so I'm not entirely sure how
       | this works. Perhaps someone more knowledgeable could shed some
       | light on it.
       | 
       | A random 2048-bit integer has a moderate chance of being
       | trivially factorizeable (I don't know the precise odds but we can
       | infer that it's roughly on the order of 2^-32 (for some
       | definition of trivial) without doing any real math). Presumably,
       | they wrote code that did something like this:
       | while true:             randomly tweak/increment 4 bytes of the
       | public modulus              spend 1 millisecond trying to factor
       | it             did it work? if yes, we're done here.
       | else, try again.
       | 
       | The resulting public modulus likely has lots of smaller factors
       | (it should be possible to verify this, if anyone knows where I
       | can find the "habibi public key"?). Although an RSA modulus
       | normally has exactly 2 prime factors, the math still works out if
       | you have more (as long as e is coprime).
        
         | fxtentacle wrote:
         | Let me try to explain that. You start with a random 2048-bit
         | integer. You then change the lower bytes to make it divisible
         | by 3. This is easy because you're only working on the public
         | key. Now that the public key is divisible by 3, you use
         | Fermat's little theorem which tells you that the private key
         | must be divisible by 3 and have a sum of digits that is
         | divisible by 3. This lets you skip most possible private keys,
         | thereby reducing the compute needed to factorize it by a few
         | orders of magnitude. And maybe you get lucky and they use that
         | RSA implementation which uses exactly 2 prime factors, because
         | then you already know that one of them is 3 and you just divide
         | the public key by 3 to get the other prime factor.
         | 
         | EDIT: Wikipedia says "The structure of the RSA public key
         | requires that N be a large semiprime (i.e., a product of two
         | large prime numbers), that 2 < e < N, that e be coprime to
         | ph(N), and that 0 <= C < N." and later "the same algorithm
         | allows anyone who factors N to obtain the private key."
         | 
         | which in the contest of the Xbox hack means that if you force N
         | to be divisible by the prime 3, then the other prime which is
         | used for generating the private key has to be N/3 => You have
         | successfully factored it.
         | 
         | EDIT2: Here's code for signing with the Habibi key:
         | https://github.com/XboxDev/xbedump/blob/b8cd5cd0f8b1cbc4e64f...
         | 
         | As you can see, it'll replace the last 4 bytes with 0x89, 0x9c,
         | 0x90, 0x6b and then start by dividing it by 3 and using that to
         | generate a suitable private key.
        
           | Retr0id wrote:
           | Ah, thanks for finding that code.
           | 
           | Here's the original public modulus as an integer: http://fact
           | ordb.com/index.php?query=207401193272587237602760... (which
           | can't be factored, at least not any time soon)
           | 
           | And here's the patched version: http://factordb.com/index.php
           | ?query=173718524353649322341982...
           | 
           | And exactly as you say, it's divisible by 3, leaving behind a
           | single large prime (so I was wrong about there being more
           | factors)
        
         | ryan-c wrote:
         | chinese remainder theorem implementations fail if there are
         | duplicate factors
        
           | Retr0id wrote:
           | CRT can only be used for private key ops e.g. signing. The
           | verification side (i.e. the logic that runs on the console)
           | can't use CRT.
        
         | beng-nl wrote:
         | A paper I co wrote deals with this problem: can we generate a
         | private key for a corrupted real public key (also 2048 bjt as
         | it happens)? The application is corrupting a public key with
         | rowhammer and then using the factorization to generate a new
         | corresponding private key. This worked for ssh and gpg keys
         | (with some assumptions for practical purposes, eg knowing the
         | contents of the page containing the key). There is an
         | empirically derived success rate as a function of available
         | compute time in Figure 7, and an analytical treatment in
         | section 3. (Explanation of practical method in section 4.4.)
         | 
         | https://www.usenix.org/system/files/conference/usenixsecurit...
        
         | hifromwork wrote:
         | >I don't know the precise odds but we can infer that it's
         | roughly on the order of 2^-32 (for some definition of trivial)
         | 
         | The chance is way, way, WAY larger than 2*-32. Consider the
         | following code:                   primes = [2, 3, 5, 7, ...,
         | 499]         def miller_rabin(n, k): ...  # your fast primarity
         | test of choice         def is_prime_trivial(n):             for
         | p in primes:                 while n % p == 0:
         | n //= p                 if n == 1:                     return
         | True             return miller_rabin(n, 20)
         | 
         | It fully factors a random 2048bit integer in around 100 tries,
         | for me.
        
       | brcmthrowaway wrote:
       | This gives me an opportunity to clarify a myth from my childhood.
       | Was Tony Hawk the first ever to hit a 720?
        
         | zimpenfish wrote:
         | https://en.wikipedia.org/wiki/Aerial_(skateboarding) says "The
         | 720, two full mid-air rotations, is one of the rarest tricks in
         | skateboarding. It was first done by Tony Hawk in 1985, and it
         | wasn't something he planned to do."
         | 
         | (Which is presumably "the first recorded" but I'm guessing if
         | someone had done it, they'd have been shouting about it and
         | -probably- the only kind of person who could pull it off would
         | be a pro skater anyway?)
        
           | detoured299 wrote:
           | At that time only a few pro vert skaters would have had the
           | ability to throw 720s, yeah. Nowadays a good number of ams
           | can too.
           | 
           | The rarity of seeing a 720 or above has as much to do with
           | the fact that most skaters don't skate vert - instead skating
           | street or smaller transition - as the trick's difficulty.
           | Outsiders tend to imagine large spins are the holy grail of
           | skate moves but almost all skaters aren't interested in them
           | for aesthetic reasons among others.
        
           | voytec wrote:
           | He worked much longer for the 900 but more importantly - he
           | repeated the 900 at the age of 48[0]!
           | 
           | [0] https://youtu.be/TnvPt_a7iOQ?t=93
        
             | imiric wrote:
             | Even more impressive: a 9-year-old did three consecutive
             | 900s in front of Tony Hawk[1].
             | 
             | Arguably, this feat is easier for a small child, but
             | still... insane talent at a young age.
             | 
             | [1]: https://old.reddit.com/1dh6p2h
        
       | ComputerGuru wrote:
       | FYI, what looks like a section header icon followed by the text
       | "So what's the habibi key?" is actually a clickable expanding
       | segment (html details). You should click it if you're interested!
       | 
       | A question I have is where/when/how the corresponding _private_
       | habibi key was released /leaked, if the story about it being used
       | exclusively by the linux console group to prevent pirated content
       | from being used is true. OP clearly was able to patch the four
       | byte difference between the MS key and the habibi key to then run
       | "unsigned" (but, actually, signed with the habibi private key)
       | executables, so they clearly got their hands on it.
        
         | bri3d wrote:
         | The Habibi key is generated by patching the Microsoft key to be
         | divisible by 3, making it quite easy to factor indeed. The
         | private key can be trivially recovered from the public key, and
         | there was nothing really to release or leak. It was basically a
         | little crypto CTF buried in the original 007: Agent Under Fire
         | savegame hack, which was basically a CTF in and of itself (it
         | was reasonably heavily obfuscated, I think both as a middle
         | finger to pirates and as a challenge to other reverse
         | engineers).
        
           | ComputerGuru wrote:
           | Thanks, that makes perfect sense.
        
       | jdlyga wrote:
       | Imagine a VSCode plugin that made up trick names and gave you a
       | combo points score at the bottom for your continuous keystrokes.
       | Tony Hawk's Pro-grammer
        
         | i_read_news wrote:
         | I think this would be more fun for VIM keybindings, where there
         | is a higher skill level (to get cooler combos of course).
        
         | high_priest wrote:
         | This describes https://codestats.net very well
        
       | Rebelgecko wrote:
       | Thanks for sharing, the other articles on this blog are equally
       | fascinating
        
         | Arrath wrote:
         | "Running Halo 2 in true HD" was a really, really good read.
        
       | Jerrrrrrry wrote:
       | It may not be possible for me to articulate how fucking insane of
       | an accomplishment this is.
       | 
       | Xbox 360...._softmod_.... via the park name on a Tony Hawk game.
       | 
       | 24 segment ROP chain :')
       | 
       | His rightful lamentation for the hypervisor, concise functional
       | write up, and immediate thoughts of an x360 botnet make this the
       | greatest xbox 360 nostalgia gut-punch of all time.
       | 
       | kudos++
        
       | Reason077 wrote:
       | In Tony Hawk's defence, he's a pro skater, not a security
       | analyst. Limited time behind the keyboard in the late 90s/early
       | 2000s grinding on his soon-to-be iconic game series would have
       | been spent making sure 900 McTwists felt really natural, not
       | auditing code for buffer overruns!
        
       | JoshTriplett wrote:
       | This seems like a great example of having the wrong security
       | mindset in console development. "We're the only thing that can
       | write this saved data, so we only have to parse what we wrote" is
       | a very common console mindset, and fundamentally wrong when
       | people can prepare artificially constructed saved data.
       | 
       | (Completely separate from that, consoles shouldn't be treating
       | users as the adversary, but given that they _do_ , games are
       | failing to have a security mindset consistent with that stance.)
        
         | cortesoft wrote:
         | > consoles shouldn't be treating users as the adversary
         | 
         | I would 100% agree with this when talking about a normal
         | computer, but I kind of feel differently about consoles. How do
         | you prevent cheating in online games if you don't restrict what
         | users can do?
        
           | JoshTriplett wrote:
           | 1) Play with people you know, or
           | 
           | 2) Group players by apparent capability and observed
           | behavior, such that cheaters end up only playing with other
           | cheaters.
        
             | searealist wrote:
             | 1. makes matchmaking impossible, which is how 99% of people
             | want to play.
             | 
             | 2. is a research project that will probably never pan out.
        
               | JoshTriplett wrote:
               | Multiple games have done 2 in production with great
               | success.
        
               | searealist wrote:
               | No they haven't. Maybe they have caught cheaters with
               | _intrusive_ software. But that's not "observed behavior
               | and apparent capability".
        
               | JoshTriplett wrote:
               | Rootkits and other mechanisms are not the only way to
               | catch cheating players. You can also rely on player
               | reports of other players, as well as anomaly detection,
               | and based on those reports, observe player actions to
               | detect obvious cheating. You don't have to be _perfect_ ,
               | just catch enough people to make it not worth the risk.
               | Use the results to either pair suspected cheaters with
               | other suspected cheaters, or just ban people if you don't
               | mind risking that they'll hide themselves better and come
               | back.
               | 
               | This is not a hypothetical or a research project; some
               | games do exactly this.
        
               | searealist wrote:
               | This is just ideological gobbledygook. No concrete
               | examples of this working in practice.
        
               | JoshTriplett wrote:
               | https://en.wikipedia.org/wiki/Cheating_in_online_games#An
               | oma...
               | 
               | https://en.wikipedia.org/wiki/Cheating_in_online_games#Pl
               | aye...
               | 
               | https://en.wikipedia.org/wiki/Cheating_in_online_games#Ba
               | nni...
               | 
               | > Certain games are known to identify cheaters and
               | "shadow ban" them by placing them in matchmaking with
               | other cheaters only, so as not to let the cheaters know
               | that they have been identified.
        
               | searealist wrote:
               | The text for the anomaly link basically just says it's
               | infeasible for a number of reasons, but it would sure be
               | nice from a privacy standpoint.
               | 
               | The banning section mentions companies that employ very
               | invasive software to find and ban cheaters. Read up on
               | how the following softwares actually work:
               | 
               | > There are many facets of cheating in online games which
               | make the creation of a system to stop cheating very
               | difficult; however, game developers and third-party
               | software developers have created or are
               | developing[22][23] technologies that attempt to prevent
               | cheating. Such countermeasures are commonly used in video
               | games, with notable anti-cheat software being BattlEye,
               | GameGuard, PunkBuster, Valve Anti-Cheat (specifically
               | used on games on the Steam platform),[citation needed]
               | and EasyAntiCheat.
        
               | alt227 wrote:
               | The problem with what you are saying, is that the
               | industry involved is shrouded in secrecy and full of
               | smoke and mirrors.
               | 
               | Because the only sources you have provided are anecdotal,
               | its entirely possible you are falling for the illusion.
               | 
               | To use your phrases, 'certain games companies' are known
               | to totally lie about their anti cheat techniques and
               | methods to throw people off the scent of the real
               | methods.
               | 
               | Therefore without actually decompiling something to prove
               | whats going on, you have no real idea what techniques are
               | being used at all.
        
               | cortesoft wrote:
               | Those games that shadowban also use anti-cheat software
               | to identify the people they need to shadowban.
        
             | aseipp wrote:
             | By that point the player base can already be devastated,
             | and it can kill the game. Cycle: Frontiers was an
             | extraction shooter, and the immediate cheating was rampant;
             | the game's design meant dying to hackers was devastating --
             | imagine a hacker forcing you to lose not just this game,
             | but retroactively making you lose the prior 5 games too.
             | This absolutely destroys player morale, near instantly.
             | Even if you ban that hacker instantly, there's an extremely
             | high chance those players will never return. Because the
             | rampant cheating went on for so long, the game's reputation
             | never recovered. Within a year of release the servers were
             | shut down.
             | 
             | A single cheater can often ruin games for hundreds or even
             | thousands of players very easily. For experienced players,
             | seeing a single flying aimbot shithead in your lobby just
             | means there are 10 other cheaters in the lobby too -- ones
             | using subtle ESP/wallhacks that can be extremely difficult
             | to detect, by design. Shady websites like G2A or GMG that
             | sell keys (which are almost 100% hot keys, to really make
             | it all come full circle) mean that even if you get banned,
             | buying new keys for a new copy of the game is extremely
             | cheap, especially when many of these games have items that
             | can be sold for IRL cash in various ways, games like Rust.
             | For many parts of the world, selling/trading rare items to
             | players can net you plenty of actual income -- and getting
             | banned means nothing as a result. Instant banning cheaters
             | the second they are confirmed leaks information to the
             | cheater and cheat creator, today most games like Warzone or
             | Destiny have to play psyops and shroud their exact
             | detection techniques in part by doing "ban waves" only when
             | they accumulate a mass amount of confirmed cheaters. The
             | cheater that ruined your top score may, necessarily and by
             | design, be allowed to run free for a while.
             | 
             | The net result of all this is that designers and --
             | importantly, even though people on Hacker News don't want
             | to hear it -- PLAYERS tend to overwhelmingly prefer
             | _prevention_ instead of _reaction_. They are both needed.
             | Players are not morons who love installing rootkits. But on
             | the whole, preventative measures tend to be more valuable
             | to players and creators than reactionary ones, even if they
             | are all ultimately imperfect.
             | 
             | In a funny twist, games like Tarkov and Rust do have a
             | gameplay mechanic that reduces the long-term psychological
             | devastation of cheaters and is not invasive at all: they
             | reset all content in the game to "neutral" every once in a
             | while, so basically all your stuff gets deleted, and
             | everyone starts over again. (This non-permanence is
             | probably one of the reasons players stick with the game,
             | despite cheaters, which are incredibly infuriating.)
             | 
             | Can I ask if you seriously play any online competitive
             | games, at a high level or otherwise? Because I do, and I'll
             | be honest: I've been hearing it all for 20 years. These
             | types of approaches _have_ had success (CS 's player review
             | system, certain shadowban systems, "trip wires" that
             | trigger on impossible game behavior), but there is _no_
             | single approach that has proven itself to be the ultimate
             | universal solution. There is no universal, wibbly wobbly
             | bullshit stats algorithm you run on your servers to
             | "solve" this. These problems are not solved. I don't like
             | it. I don't run certain games with certain forms of
             | anticheat. But it is what it is.
        
           | kevincox wrote:
           | I think public matchmaking and local play are different. I
           | can mostly get behind anti-cheat for public play. But for
           | local and optionally for friend games it should be possible
           | to extract, edit and play with your saves.
        
         | extraduder_ire wrote:
         | It is kind of coming true now, since all current consoles both
         | encrypt and sign savegames to lock them to your account, and
         | most (don't know about xbox) don't even let you copy your saves
         | anywhere but the console and paid cloud storage.
        
           | BoringTimesGang wrote:
           | Sad. Today's children will never learn how to generate a
           | valid CRC for a hex-edited save file.
        
       | culopatin wrote:
       | I was hoping that the exploit would only execute if he stuck the
       | landing across the gap
        
       | mclau156 wrote:
       | at that point I would rather re-make the game in Godot
        
       | megaloblasto wrote:
       | I have a dumb question. Once you soft mod a game console, what
       | type of stuff can you do?
        
         | forgotmyacc wrote:
         | Back in the day we mostly did it to cheat in Halo 2 online
         | multiplayer. I remember being 13 and would stick a butter knife
         | into my Xbox DVD drive so the "old maps" (on the disc) would
         | fail to load, forcing the game to load a "new map" on the hard
         | drive which I patched via soft modding the console allowing FTP
         | to edit the maps on the hard drive.
        
       ___________________________________________________________________
       (page generated 2024-08-08 23:02 UTC)