[HN Gopher] Tony Hawk's Pro Strcpy
___________________________________________________________________
Tony Hawk's Pro Strcpy
Author : ndiddy
Score : 672 points
Date : 2024-08-07 16:48 UTC (1 days ago)
(HTM) web link (icode4.coffee)
(TXT) w3m dump (icode4.coffee)
| jonhohle wrote:
| This is awesome!
|
| I've been doing some PSX decompiling and there are lots of
| similar things there as well. Interestingly, something like
| `memmove` is linked in using an SDK library[0], but `strcpy` is a
| function provided by the BIOS. Later version of the SDK could
| patch that out for a library version, but as late as 1997 it
| hadn't been.
|
| 0 - https://github.com/Xeeynamo/sotn-
| decomp/blob/master/src/main...
| anthk wrote:
| I'd love a reimplemeantion in C+SDL2 (and OpenGL 2.1) of the
| former console games.
|
| Now there are the N64 games being ported to PC with
| decompilers, I can only hope. Inb4 "there are native PC
| versions of these, you know"... most recompiled N64 games with
| the FX's being 'deshaderized' to pure textures (or simpler
| FX's) can be run in toasters such as cheap netbooks from 2009
| and nearly anywhere.
|
| They even ported Super Mario 64 to the 3DFX API. I know, the
| most complex games accesing the N64 framebuffer with complex FX
| will require OpenGL 3.3 to mimic that microcode; but, as I said
| before, when the engines run uber-fast on anything post Pentium
| III, is not something difficult to 'mimic' these in software
| while the rest it's running GL 2.1 accelerated.
| bitwize wrote:
| > They even ported Super Mario 64 to the 3DFX API.
|
| That's... not surprising. UltraHLE ran SM64 like a dream, and
| the HLE bit referred to the fact that the emulator translated
| 3D calls to the Glide API rather than attempting to emulate
| the 3D hardware directly.
| anthk wrote:
| Yeah, I knew about that, so this it's just transcribing
| instead of translating. But I'd guess SGI machines (IRIX)
| being OpenGL bound (they invented it) the N64 would map the
| microcode to GL funcs much better.
| bitwize wrote:
| Glide was modelled after OpenGL, so I'm guessing the
| mapping was not that much of a stretch anyway.
| rvnx wrote:
| and then you had to use an extra DLL that was essentially
| translating 3dfx calls to DirectX
| anthk wrote:
| That was later. NGlide for sure.
| astrange wrote:
| The N64 used a different texture interpolation method than
| anything else ever has (IIRC three-point instead of four-
| point) so if you do the equivalent of HLE like that, it'll
| look bad and blurry. Of course, official rereleases of N64
| games haven't emulated it properly either.
| anthk wrote:
| Not an issue at 640x480 and higher resolutions.
|
| Once your run the games at 1024x768 and up that doesn't
| matter at all:
|
| sm64ex, perfectdark, zeldaoot for PC...
| perihelions wrote:
| - _" If I was lucky it would be strcpy (opposed to something like
| strncpy)"_
|
| it really ought to have been strncpy, I'm sure Tony Hawk who's
| lauded for his advocacy of safety gear would prefer to be
| associated with safer string copying
| kragen wrote:
| strncpy is definitely not safer; it produces unterminated
| strings when it hits _n_
|
| basically you should almost never use strncpy; it's
| specifically for fixed-size fields like this:
| struct dirent { unsigned short inode; char name[14]; };
|
| and in those cases more often than not the pad byte should be a
| space rather than a nul
|
| strncpy should never have been added to the standard library
| sidewndr46 wrote:
| What is the preferred solution here? I usually just use
| "memset" to zeroize the whole destination string, then tell
| "strncpy" that my destination is 1 byte shorter than what it
| really is.
|
| The real issue I've ran into is that "strncpy" assumes the
| source is null-terminated.
| connicpu wrote:
| C11 adds `strcpy_s` which takes (dest, destsz, src) and
| returns an errno_t which will report an error if the src
| string is longer than destsz, as silent truncation is often
| not a desirable behavior. It also assigns dest[0]='\0' on
| error so you don't get an unterminated garbage string.
| david2ndaccount wrote:
| Only msvc provides strcpy_s and they don't conform to the
| standard. Other libcs don't provide it. Ignore everything
| from Annex K and write your own wrappers around memcpy.
| You should always know the size of your buffers.
| connicpu wrote:
| Ah that sucks. Guess C is just stuck like this for the
| long term. Writing your own functions is still the best
| advice :'(
| kragen wrote:
| on the plus side, c is good at writing your own functions
| david2ndaccount wrote:
| Use memcpy and do the size check yourself beforehand
| (taking the appropriate action if it doesn't fit). Avoid
| any function starting with str except for strlen. Prefer
| pointer+length instead of relying on nul-terminated
| strings.
| nrclark wrote:
| You mean strnlen.
| paulryanrogers wrote:
| strlcpy?
| saagarjha wrote:
| memccpy, then use the return value to terminate.
| pjmlp wrote:
| Use a proper C string library like SDS.
|
| Or move up from the 1970's Bell Labs, adopt C++ with the
| respective compiler switches to have bounds checking
| enabled for _operator[]()_.
|
| Better yet, use something else instead of one of those two,
| pick whatver is your fancy.
| Sesse__ wrote:
| The sanest solution is, surprisingly, snprintf(dst,
| sizeof(dst), "%s", src).
| kragen wrote:
| please don't fill your program with fifty zillion string
| buffers of arbitrarily chosen sizes and then try to
| separately pass the right size in seventy zillion string-
| processing function calls. your code will be hard to
| read, buggy, and probably insecure
| Sesse__ wrote:
| I agree with that statement, but it has nothing to do
| with snprintf() versus e.g. strcpy_s(), where you have
| exactly the same requirement to pass the right size.
|
| (Separately, there's a discussion about how many bytes
| you are allowed to read from the _source_, but to fix
| that, you need something like the Linux kernel's
| strscpy(), which isn't really widely supported in
| userspace.)
| kragen wrote:
| i agree
| 1over137 wrote:
| strlcpy() is my favourite, alas the GNU folks stubbornly
| refuse to embrace it, last I checked.
| jandrese wrote:
| strlcpy is still braindamaged. The need to return the
| length of the source string for compatibility with old
| code means it suffers from the some of the same issues
| strncpy did.
| 1over137 wrote:
| Sure, but strlcpy is better than strcpy and strncpy (for
| strings). I almost never see code that uses the return
| value of any of them.
|
| It is a simple refactoring to change strcpy/strncpy to
| strlcpy and, though it doesn't solve truncation issues,
| it's a solid improvement by eliminating memory overruns
| and lack of null termination.
|
| It was added to OpenBSD in 1998 and then in FreeBSD, Mac
| OS X, Solaris, IRIX, etc. but its adoption was hampered
| by glibc stubbornly refusing to add it (until 2023
| apparently).
| jandrese wrote:
| It is frustrating because while it is better, it is still
| flawed in an easily fixed way.
|
| What I wish the standard library had:
| ssize_t strscpy(char* dst, const char* src, ssize_t
| dsize);
|
| Copies src into dst, stopping when it either reaches a \0
| byte in src or on copying dsize - 1 bytes, whichever
| happens first. dst is then null terminated.
|
| If the copy is not truncated the strscpy returns the
| number of bytes copied. If the copy is truncated dsize is
| returned.
|
| Returns a negative value and sets errno if either dst or
| src is NULL or dsize is < 1.
| kragen wrote:
| the _strlcpy_ paper explains why _strlcpy_ isn 't
| designed the way you suggest: https://www.usenix.org/lega
| cy/event/usenix99/full_papers/mil...
|
| they actually started out with your design and then fixed
| it:
|
| > _The return values started out as the number of
| characters copied, since this was trivial to get as a
| side effect of the copy or concatenation. We soon decided
| that a return value with the same semantics as
| snprintf()'s was a better choice since it gives the
| programmer the most flexibility with respect to
| truncation detection and recovery._
|
| basically they wanted to treat string truncation due to
| insufficient space as an error condition, so they
| designed the interface to make it easy to check (code
| from the paper, with syntax corrections):
| len = strlcpy(path, homedir, sizeof(path)); if
| (len >= sizeof(path)) return (ENAMETOOLONG); len
| = strlcat(path, "/", sizeof(path)); if (len >=
| sizeof(path)) return (ENAMETOOLONG);
|
| your proposal does permit such simple error checking for
| _strscpy_ , although it is marginally less efficient:
| len = strscpy(path, homedir, sizeof(path)); if
| (len <= strlen(homedir)) return (ENAMETOOLONG);
|
| but i can't think of anything your corresponding
| _strscat_ could return to permit a similarly simple
| check. is there anything?
| jandrese wrote:
| Basically, it seems like they wanted to facilitate the
| use case of "if the string is truncated, then realloc()
| the buffer to make it fit using the value convienently
| returned from strlcpy()".
|
| (note: the following code is probably buggy with off by
| one errors and doesn't check return value properly)
| copied_bytes = strlcpy(dst, src, size); if (
| copied_bytes >= size ) { realloc(dst,
| copied_bytes); size = strlcat(dst, src + size,
| copied_bytes); }
|
| I can appreciate that sentiment, but I think it was a
| mistake. That behavior is still possible where it makes
| sense by using strlen() first, but I'd argue that in most
| cases if this is possible then strdup() was the better
| solution all along. Basically they made a tradeoff that
| makes one relatively uncommon use case easier at the
| expense of making the function explode in other cases.
|
| You don't need strlen() to check for truncation:
| if ( strscpy(path, homedir, sizeof(path) == sizeof(path)
| ) return (ENAMETOOLONG);
|
| strscat() would have a similar syntax. Return values
| would be the same, including returning the size parameter
| in the case of truncation, making it easy to check.
| if ( strscat(path, "/", sizeof(path) == sizeof(path) )
| return (ENAMETOOLONG);
|
| On a side note I always cringe when I see people using
| sizeof() on strings in C. I left them in here to make the
| comparison easier, but I wouldn't normally do it this
| way. That's a gun pointed directly at your foot when this
| bit of code gets refactored out to a function and that
| string degrades to a pointer.
| kragen wrote:
| i think they wanted to facilitate the use case of 'if the
| string is truncated, then close the connection and log an
| error message', or 'if the string is truncated, then
| return an error code', as in the example code i quoted
| from the paper
|
| strdup() is not helpful in examples like the one i quoted
| from the paper, where you are building up a string by
| concatenating substrings, but something like stralloc is.
| (see other subthread) the paper recommends the libmib
| astring functions, which are something like stralloc:
| http://www.mibsoftware.com/libmib/astring/. they
| definitely were not recommending that people copy and
| paste those six lines of code with slight changes every
| time they wanted to copy a string
|
| i don't agree that it makes the function explode in other
| use cases. if you're okay with truncation then strlcpy()
| will silently truncate your strings if you don't check
| its return value
|
| your strscpy() example has a parse error; i think you
| meant if ( strscpy(path, homedir,
| sizeof(path)) == sizeof(path) ) return (ENAMETOOLONG);
|
| which leads me to think that you mean that if
| strlen(homedir) is 12 and sizeof(path) is 13, strscpy
| copies 12 characters (not counting the nul) and returns
| 12, not 13, while if strlen(homedir) is 13 in that case,
| it also copies 12 characters, but returns 13. i agree
| that that would work; it is so similar to the flawed
| design rejected in the strlcpy paper that i thought you
| meant the same thing, but you evidently meant something
| subtly different. i agree that that design would also
| work for strscat
|
| at that point, though, it might be better to return -1 or
| INT_MAX rather than dsize on truncation; you can't use
| the return value you've specified for anything before you
| check whether it's equal to dsize or not. (this is also
| true of strlcpy!) actually you also specified to return a
| negative value on certain other errors, which means you
| have to check the return value _twice_ before using it
| for anything; possibly this was a mistake
|
| i also agree that using sizeof on arrays is a footgun for
| exactly the reason you say, although in this case the
| most likely result would be that you'd notice the bug and
| fix it, since pointers are too short for most strings
| kevin_thibedeau wrote:
| As discussed a few weeks ago, strlen() + memcpy() is
| faster than strlcpy() on superscalar platforms with
| branch prediction. Iterating over the string twice is not
| a penalty if the alternative hobbles the hardware with
| more complex code.
|
| https://nrk.neocities.org/articles/cpu-vs-common-sense
| kragen wrote:
| agreed
| jabl wrote:
| It's in glibc as of 2.38: https://sourceware.org/git/?p=g
| libc.git;a=commit;h=454a20c87...
| 1over137 wrote:
| Cool, didn't know, thanks for sharing. Well, literally 25
| years after OpenBSD added strlcpy, but better late than
| never I guess.
| kragen wrote:
| for general-purpose string handling in software where
| failure is an option, i like the qmail stralloc approach
| if (!addrparse(arg)) { err_syntax(); return; }
| flagbarf = bmfcheck(); seenmail = 1; if
| (!stralloc_copys(&rcptto,"")) die_nomem(); if
| (!stralloc_copys(&mailfrom,addr.s)) die_nomem(); if
| (!stralloc_0(&mailfrom)) die_nomem(); out("250
| ok\r\n");
|
| basically you have a struct with buffer-pointer, length,
| and capacity fields, like a golang slice, and you modify it
| with a small number of functions which reallocate the
| buffer if it isn't big enough. the ones you see here are
| stralloc_copys, which sets the buffer contents to the
| contents of a nul-terminated string, and stralloc_0, which
| appends a nul to the buffer. there are also functions for
| appending an arbitrary byte, for copying one stralloc to
| another, for copying counted strings into strallocs, and
| for concatenation, for determining whether one is a prefix
| of another, etc., but depending on the application, you may
| or may not need to implement these
|
| the whole stralloc library is 97 lines of k&r c, so
| reimplementing the part you need for a given program is
| pretty trivial. it's in the public domain
|
| for most programs, a disadvantage of the particular way
| that stralloc is implemented in qmail is that you have to
| check every single copy or concatenation operation for an
| out-of-memory error, as you see above. this makes your code
| a lot longer. many applications are better off just
| aborting inside the memory allocation function if they run
| out of memory; getting out-of-memory handling correct is
| very difficult, especially if you don't devote a massive
| amount of effort to testing out-of-memory conditions
| (because they won't occur often enough just by chance to
| test your error-handling path)
|
| (another disadvantage of the way stralloc is implemented is
| that you probably don't really want to use unsigned int for
| the two length fields on lp64 platforms)
|
| for some applications you might prefer just using strdup()
| (or xstrdup()) or non-owning string-view types (a pointer
| and a length, perhaps into an input file you've mapped into
| memory), or lisp-style symbol interning (plus some kind of
| buffer management probably). arena allocation, if you can
| afford it, makes dynamic memory allocation for strings a
| much more reasonable thing to do: no risk of a memory leak,
| fast allocation, instant deallocation. but again some
| applications do poorly with arena allocation
|
| but please don't fill your program with fifty zillion
| string buffers of arbitrarily chosen sizes and then try to
| separately pass the right size in seventy zillion string-
| processing function calls. your code will be hard to read,
| buggy, and probably insecure. factor string buffer length
| handling into a small part of your program so that most of
| your code never has to think about string buffer lengths
| lelanthran wrote:
| > for most programs, a disadvantage of the particular way
| that stralloc is implemented in qmail is that you have to
| check every single copy or concatenation operation for an
| out-of-memory error, as you see above. this makes your
| code a lot longer. many applications are better off just
| aborting inside the memory allocation function if they
| run out of memory; getting out-of-memory handling correct
| is very difficult, especially if you don't devote a
| massive amount of effort to testing out-of-memory
| conditions (because they won't occur often enough just by
| chance to test your error-handling path)
|
| You _could_ do that. Or you could put a field in the
| struct that stores an error flag. If flag is set, all
| `stralloc` functions return immediately. When they fail,
| they set the flag and then return.
|
| This lets you do:
| stralloc_copys(&rcptto,""));
| stralloc_copys(&mailfrom,addr.s));
| stralloc_0(&mailfrom)); if
| (stralloc_error(rcptto) || stralloc_error(mailfrom)) {
| die_nomem(); }
|
| I'd go one further and make the error checker function
| take variable arguments, so that the last line looks like
| this: if (stralloc_error (rcptto,
| mailfrom, NULL)) { die_nomem(); }
|
| IME, forgetting to terminate the parameter list with a
| NULL _almost always_ causes the program to blow up on the
| very first execution.
|
| > but please don't fill your program with fifty zillion
| string buffers of arbitrarily chosen sizes and then try
| to separately pass the right size in seventy zillion
| string-processing function calls. your code will be hard
| to read, buggy, and probably insecure. factor string
| buffer length handling into a small part of your program
| so that most of your code never has to think about string
| buffer lengths
|
| I agree, but after years and years of looking at and
| writing idiomatic _safe_ C code, I am _now_ of the
| opinion that a string library is, while a better approach
| to slinging around raw strings, still very much the
| _wrong_ approach.
|
| Nothing stops the developer from doing _Parse, Don 't
| Validate!_ in C, and this means that seeing C strings
| being used anywhere other than at the boundaries to the
| system evokes my code-smell senses.
| kragen wrote:
| these are very good ideas; thank you! by coincidence
| yesterday i was looking at some code i wrote in golang
| six years ago which uses this same approach to error
| handling for i/o errors. i wonder if you might be better
| off putting the error flag in the allocator rather than
| the individual string objects?
|
| i do think _parse, don 't validate_ is much more
| difficult in c; c's type system is not strong enough to
| give you the kinds of soundness guarantees you get from
| ocaml or haskell. if you forget a type case in a switch,
| there's not a whole lot you can do to get the compiler to
| complain about it
|
| the code i quoted above is from qmail-smtpd.c, which is
| 373 lines of code like the above and contains all of
| qmail's smtp input logic except for ip_scanbracket, which
| parses strings like [127.0.0.1] and is shared with dns.c.
| it's not clear to me that a _parse, don 't validate_
| approach would consist of much more than just the parser
|
| maybe using a parser generator for all your input and
| output handling would help? i'm still skeptical that
| something like a text editor or a macro processor is
| going to have a large body of code that is free of string
| handling
| lelanthran wrote:
| > i do think parse, don't validate is much more difficult
| in c; c's type system is not strong enough to give you
| the kinds of soundness guarantees you get from ocaml or
| haskell. if you forget a type case in a switch, there's
| not a whole lot you can do to get the compiler to
| complain about it
|
| Sure, I agree, but we're talking about strings here.
| Instead of a function taking or returning an email
| address in a generic string type, it can take or return
| an email address type.
|
| For example, construction of a value of type `email_t`
| can take a parameter of raw string. Then any function in
| the rest of the code that receives an email would receive
| an `email_t`, not a `char _` or some other generic string
| type.
|
| > it's not clear to me that a parse, don't validate
| approach would consist of much more than just the parser
|
| It might often be nothing _but* a parser, such as
| `email_t`, but it means that no `str _()` function would
| then be used by the caller - any operation on the
| `email_t`, if `email_t` is an opaque pointer, would used
| the `email_type__ ()` functions, because the users of any
| `email_t` value cannot access, or even see, the fields
| inside an `email_t` value.
|
| This means that passing an email to a function expecting
| a name would cause a compiler error.
|
| For the IP address example you mention, that _definitely_
| should be parsed only once into the quad-byte or quad-
| quad integer fields.
|
| I mean, I'm looking over my previous projects: every
| single instance of a string I am using is actually not
| just "generic string"; there's an associated type with it
| (name, description, comment, whatever). Making those into
| different types with their own operations means that the
| compiler will generate errors if I try to use a
| `description_t` where a `name_t` is expected.
| kragen wrote:
| possibly you need to \ your *s
|
| indeed it is the case in qmail that ip_scanbracket
| populates a struct ip_address. but rcptto, where the
| destination email address goes, is just a byte buffer in
| a very simple ad-hoc format which, if i understand
| correctly, gets written to a pipe; qmail's privilege
| separation design, which its author to a significant
| extent came to regret, adds some extra difficulties here
| by requiring things to run in separate processes
|
| what you read from or write to a pipe is, at that point,
| necessarily just a generic string. you could write some
| kind of generic serialization layer, but doing that in c
| requires a preprocessor, and unmarshaling things in a
| statically type-safe way really requires compiler support
| for sum types, which c doesn't have
|
| aside from that, i think it's pretty likely that trying
| to parse the email address in the qmail-smtpd process
| would have made the code more bug-prone rather than less
| so
| lelanthran wrote:
| > is just a byte buffer in a very simple ad-hoc format
| which, if i understand correctly, gets written to a pipe;
| qmail's privilege separation design, which its author to
| a significant extent came to regret, adds some extra
| difficulties here by requiring things to run in separate
| processes
|
| > ...
|
| > what you read from or write to a pipe is, at that
| point, necessarily just a generic string.
|
| Aren't the reader and writer of the pipe part of the same
| software package?
|
| If they are, then safe[1] functions for _that_ type make
| sense: bool email_to_bytes(email_t
| *src, uint8_t **dst, size_t *len); bool
| email_from_bytes(email_t *dst, uint8_t *src, size_t len);
|
| This still means that you're only ever passing around
| `email_t` values, not `char *` values.
|
| On the other hand, if the reader and writer of the pipe
| are in different packages, then the pipe is the boundary
| for each of them, and you wouldn't be passing language
| native types without first serialising to a language
| independent representation anyway.
|
| [1] By "safe" I mean that they don't overflow and that
| the actual binary format allows the `from_bytes` function
| to determine when the input could be malicious.
| kragen wrote:
| yes, they are part of the same software package. i guess
| qmail-smtpd does have to parse the email address somewhat
| in order to match the domain against rcpthosts so it can
| reject attempts to relay mail? and yeah, that's what the
| addrparse() call does in the code i quoted--but it just
| stores the email in _addr_ as a canonicalized string. so
| it may end up rewriting things like lelanthran@[10.1.2.3]
| as lelanthran@lelanthran.com, for example, and also has
| code to strip out explicit smtp source routes
| (@foo.com:lelanthran@lelanthran.com) which were still in
| theory required when qmail came out
|
| so when i implied that 'trying to parse the email address
| in the qmail-smtpd process' was not a thing that was
| already being done, i was wrong, so plausibly your
| recommendation is in fact applicable; maybe it would have
| been better to parse the email address into a struct with
| user and host fields, then have _email_to_bytes_
| represent the email <kragen@gentle.dyn.ml.org> as
| T6:kragen,17:gentle.dyn.ml.org, instead of as
| Tkragen@gentle.dyn.ml.org\0. i mean you could generate
| _email_to_bytes_ with a code generator (an idl compiler)
| instead of writing it
|
| then you wouldn't have to worry about the possibility
| that you'd accidentally left an @ in one part or the
| other--unless you did relay the mail over smtp to some
| other host, in which case you would have to worry about
| it anyway
| cobbal wrote:
| strncpy is fine as long as it's not used in isolation. My
| preferred pattern (when I want the truncation) is to use it,
| and then unconditionally set the last byte of the buffer to
| null. This will always result in a valid C string.
| david2ndaccount wrote:
| The correct thing to do is to use memcpy and to know the size
| of both the destination buffer and the source buffer. If the
| source buffer won't fit, then you need to take an application-
| specific action (is truncation ok? do you have to abort the
| whole operation? Do you re-alloc the destination buffer? etc.)
| strncpy almost always does the wrong thing.
| imron wrote:
| Agree with the general principle of knowing your buffer
| sizes, but the issue with memcpy (evidenced over many years
| with various CVEs) is that someone invariably takes a string
| length and forgets to plus one, leading to non-null-
| terminated strings.
| thekevan wrote:
| I read that he used to drive around and when he saw a
| skateboarder, he'd yell "do an ollie" and then give them a new
| helmet.
| dfex wrote:
| "Do a kick-flip"
| lloeki wrote:
| That's on Eric Koston though although Tony Hawk did
| participate.
|
| https://m.youtube.com/watch?v=ob0dI05Xz8s
|
| Koston did not invent the thing but has been a major
| popularity contributor to it.
|
| https://www.surfertoday.com/skateboarding/why-do-
| skateboarde...
| StressedDev wrote:
| If you are doing Windows C/C++ development, you can use the
| strsafe.h functions (https://learn.microsoft.com/en-
| us/windows/win32/api/strsafe/). When I wrote C/C++, I found
| them easier to use than the standard C functions because they
| handled all of the usual failure cases (buffer too small,
| integer overflow, etc.). It was also easy to check if there was
| a failure because all of the functions returned a failure code
| if something went wrong.
|
| In this case, StringCchCopyW(), or StringCbCopyW() would be a
| better choice than strcpy.
| nj5rq wrote:
| Very good article.
| makin wrote:
| A bit of a shame about the exploit applying to THUG PRO. The mod
| is played to this day, since the more competitive side of the
| Tony Hawk franchise has been dead for almost twenty years (with
| the exception of the THPS1+2 remake, which was but a blip in the
| scene).
|
| The mod itself is over 10 years old now, and I think the original
| developers are gone, explaining why no one was interested in
| fixing it when Ryan reported it. But this means that now the mod
| is unusable, no one is going to want to risk a full privilege
| exploit taking over their PC.
|
| Hopefully this article reaches someone who's a bit more
| interested in patching the mod.
| rlabrecque wrote:
| I wish I had the time, because it would be fun. Back when I DID
| have time, I actually got that thug1 source code almost
| playable on Windows. That source code was only for the console
| versions, and the code assumed if it was compiling for windows
| (and not Xbox windows..) it was only for tools, so a lot of
| pieces worked completely differently.
| auto wrote:
| I've read so many flavors of this sort of exploit analysis over
| the years, and if I get to read 100 more I'll be all the happier
| for it.
|
| Great article!
| Retr0id wrote:
| > The more interesting thing about the habibi key is that the
| public key modulus only has a 4 byte difference compared to the
| Microsoft RSA public key. For reference the MS key is a 2048 bit
| RSA key. I've asked a few people how this might be possible and
| the answer I got is "if you change the exponent to something
| small like 3 you easily factor out a similar key". This should
| require that the exponent of the public key is also patched to
| "3". However, none of the shell code payloads that use the habibi
| key ever change the exponent used by the RSA signature
| verification routine. Presumably it's still performing the
| validation using the exponent 65537 so I'm not entirely sure how
| this works. Perhaps someone more knowledgeable could shed some
| light on it.
|
| A random 2048-bit integer has a moderate chance of being
| trivially factorizeable (I don't know the precise odds but we can
| infer that it's roughly on the order of 2^-32 (for some
| definition of trivial) without doing any real math). Presumably,
| they wrote code that did something like this:
| while true: randomly tweak/increment 4 bytes of the
| public modulus spend 1 millisecond trying to factor
| it did it work? if yes, we're done here.
| else, try again.
|
| The resulting public modulus likely has lots of smaller factors
| (it should be possible to verify this, if anyone knows where I
| can find the "habibi public key"?). Although an RSA modulus
| normally has exactly 2 prime factors, the math still works out if
| you have more (as long as e is coprime).
| fxtentacle wrote:
| Let me try to explain that. You start with a random 2048-bit
| integer. You then change the lower bytes to make it divisible
| by 3. This is easy because you're only working on the public
| key. Now that the public key is divisible by 3, you use
| Fermat's little theorem which tells you that the private key
| must be divisible by 3 and have a sum of digits that is
| divisible by 3. This lets you skip most possible private keys,
| thereby reducing the compute needed to factorize it by a few
| orders of magnitude. And maybe you get lucky and they use that
| RSA implementation which uses exactly 2 prime factors, because
| then you already know that one of them is 3 and you just divide
| the public key by 3 to get the other prime factor.
|
| EDIT: Wikipedia says "The structure of the RSA public key
| requires that N be a large semiprime (i.e., a product of two
| large prime numbers), that 2 < e < N, that e be coprime to
| ph(N), and that 0 <= C < N." and later "the same algorithm
| allows anyone who factors N to obtain the private key."
|
| which in the contest of the Xbox hack means that if you force N
| to be divisible by the prime 3, then the other prime which is
| used for generating the private key has to be N/3 => You have
| successfully factored it.
|
| EDIT2: Here's code for signing with the Habibi key:
| https://github.com/XboxDev/xbedump/blob/b8cd5cd0f8b1cbc4e64f...
|
| As you can see, it'll replace the last 4 bytes with 0x89, 0x9c,
| 0x90, 0x6b and then start by dividing it by 3 and using that to
| generate a suitable private key.
| Retr0id wrote:
| Ah, thanks for finding that code.
|
| Here's the original public modulus as an integer: http://fact
| ordb.com/index.php?query=207401193272587237602760... (which
| can't be factored, at least not any time soon)
|
| And here's the patched version: http://factordb.com/index.php
| ?query=173718524353649322341982...
|
| And exactly as you say, it's divisible by 3, leaving behind a
| single large prime (so I was wrong about there being more
| factors)
| ryan-c wrote:
| chinese remainder theorem implementations fail if there are
| duplicate factors
| Retr0id wrote:
| CRT can only be used for private key ops e.g. signing. The
| verification side (i.e. the logic that runs on the console)
| can't use CRT.
| beng-nl wrote:
| A paper I co wrote deals with this problem: can we generate a
| private key for a corrupted real public key (also 2048 bjt as
| it happens)? The application is corrupting a public key with
| rowhammer and then using the factorization to generate a new
| corresponding private key. This worked for ssh and gpg keys
| (with some assumptions for practical purposes, eg knowing the
| contents of the page containing the key). There is an
| empirically derived success rate as a function of available
| compute time in Figure 7, and an analytical treatment in
| section 3. (Explanation of practical method in section 4.4.)
|
| https://www.usenix.org/system/files/conference/usenixsecurit...
| hifromwork wrote:
| >I don't know the precise odds but we can infer that it's
| roughly on the order of 2^-32 (for some definition of trivial)
|
| The chance is way, way, WAY larger than 2*-32. Consider the
| following code: primes = [2, 3, 5, 7, ...,
| 499] def miller_rabin(n, k): ... # your fast primarity
| test of choice def is_prime_trivial(n): for
| p in primes: while n % p == 0:
| n //= p if n == 1: return
| True return miller_rabin(n, 20)
|
| It fully factors a random 2048bit integer in around 100 tries,
| for me.
| brcmthrowaway wrote:
| This gives me an opportunity to clarify a myth from my childhood.
| Was Tony Hawk the first ever to hit a 720?
| zimpenfish wrote:
| https://en.wikipedia.org/wiki/Aerial_(skateboarding) says "The
| 720, two full mid-air rotations, is one of the rarest tricks in
| skateboarding. It was first done by Tony Hawk in 1985, and it
| wasn't something he planned to do."
|
| (Which is presumably "the first recorded" but I'm guessing if
| someone had done it, they'd have been shouting about it and
| -probably- the only kind of person who could pull it off would
| be a pro skater anyway?)
| detoured299 wrote:
| At that time only a few pro vert skaters would have had the
| ability to throw 720s, yeah. Nowadays a good number of ams
| can too.
|
| The rarity of seeing a 720 or above has as much to do with
| the fact that most skaters don't skate vert - instead skating
| street or smaller transition - as the trick's difficulty.
| Outsiders tend to imagine large spins are the holy grail of
| skate moves but almost all skaters aren't interested in them
| for aesthetic reasons among others.
| voytec wrote:
| He worked much longer for the 900 but more importantly - he
| repeated the 900 at the age of 48[0]!
|
| [0] https://youtu.be/TnvPt_a7iOQ?t=93
| imiric wrote:
| Even more impressive: a 9-year-old did three consecutive
| 900s in front of Tony Hawk[1].
|
| Arguably, this feat is easier for a small child, but
| still... insane talent at a young age.
|
| [1]: https://old.reddit.com/1dh6p2h
| ComputerGuru wrote:
| FYI, what looks like a section header icon followed by the text
| "So what's the habibi key?" is actually a clickable expanding
| segment (html details). You should click it if you're interested!
|
| A question I have is where/when/how the corresponding _private_
| habibi key was released /leaked, if the story about it being used
| exclusively by the linux console group to prevent pirated content
| from being used is true. OP clearly was able to patch the four
| byte difference between the MS key and the habibi key to then run
| "unsigned" (but, actually, signed with the habibi private key)
| executables, so they clearly got their hands on it.
| bri3d wrote:
| The Habibi key is generated by patching the Microsoft key to be
| divisible by 3, making it quite easy to factor indeed. The
| private key can be trivially recovered from the public key, and
| there was nothing really to release or leak. It was basically a
| little crypto CTF buried in the original 007: Agent Under Fire
| savegame hack, which was basically a CTF in and of itself (it
| was reasonably heavily obfuscated, I think both as a middle
| finger to pirates and as a challenge to other reverse
| engineers).
| ComputerGuru wrote:
| Thanks, that makes perfect sense.
| jdlyga wrote:
| Imagine a VSCode plugin that made up trick names and gave you a
| combo points score at the bottom for your continuous keystrokes.
| Tony Hawk's Pro-grammer
| i_read_news wrote:
| I think this would be more fun for VIM keybindings, where there
| is a higher skill level (to get cooler combos of course).
| high_priest wrote:
| This describes https://codestats.net very well
| Rebelgecko wrote:
| Thanks for sharing, the other articles on this blog are equally
| fascinating
| Arrath wrote:
| "Running Halo 2 in true HD" was a really, really good read.
| Jerrrrrrry wrote:
| It may not be possible for me to articulate how fucking insane of
| an accomplishment this is.
|
| Xbox 360...._softmod_.... via the park name on a Tony Hawk game.
|
| 24 segment ROP chain :')
|
| His rightful lamentation for the hypervisor, concise functional
| write up, and immediate thoughts of an x360 botnet make this the
| greatest xbox 360 nostalgia gut-punch of all time.
|
| kudos++
| Reason077 wrote:
| In Tony Hawk's defence, he's a pro skater, not a security
| analyst. Limited time behind the keyboard in the late 90s/early
| 2000s grinding on his soon-to-be iconic game series would have
| been spent making sure 900 McTwists felt really natural, not
| auditing code for buffer overruns!
| JoshTriplett wrote:
| This seems like a great example of having the wrong security
| mindset in console development. "We're the only thing that can
| write this saved data, so we only have to parse what we wrote" is
| a very common console mindset, and fundamentally wrong when
| people can prepare artificially constructed saved data.
|
| (Completely separate from that, consoles shouldn't be treating
| users as the adversary, but given that they _do_ , games are
| failing to have a security mindset consistent with that stance.)
| cortesoft wrote:
| > consoles shouldn't be treating users as the adversary
|
| I would 100% agree with this when talking about a normal
| computer, but I kind of feel differently about consoles. How do
| you prevent cheating in online games if you don't restrict what
| users can do?
| JoshTriplett wrote:
| 1) Play with people you know, or
|
| 2) Group players by apparent capability and observed
| behavior, such that cheaters end up only playing with other
| cheaters.
| searealist wrote:
| 1. makes matchmaking impossible, which is how 99% of people
| want to play.
|
| 2. is a research project that will probably never pan out.
| JoshTriplett wrote:
| Multiple games have done 2 in production with great
| success.
| searealist wrote:
| No they haven't. Maybe they have caught cheaters with
| _intrusive_ software. But that's not "observed behavior
| and apparent capability".
| JoshTriplett wrote:
| Rootkits and other mechanisms are not the only way to
| catch cheating players. You can also rely on player
| reports of other players, as well as anomaly detection,
| and based on those reports, observe player actions to
| detect obvious cheating. You don't have to be _perfect_ ,
| just catch enough people to make it not worth the risk.
| Use the results to either pair suspected cheaters with
| other suspected cheaters, or just ban people if you don't
| mind risking that they'll hide themselves better and come
| back.
|
| This is not a hypothetical or a research project; some
| games do exactly this.
| searealist wrote:
| This is just ideological gobbledygook. No concrete
| examples of this working in practice.
| JoshTriplett wrote:
| https://en.wikipedia.org/wiki/Cheating_in_online_games#An
| oma...
|
| https://en.wikipedia.org/wiki/Cheating_in_online_games#Pl
| aye...
|
| https://en.wikipedia.org/wiki/Cheating_in_online_games#Ba
| nni...
|
| > Certain games are known to identify cheaters and
| "shadow ban" them by placing them in matchmaking with
| other cheaters only, so as not to let the cheaters know
| that they have been identified.
| searealist wrote:
| The text for the anomaly link basically just says it's
| infeasible for a number of reasons, but it would sure be
| nice from a privacy standpoint.
|
| The banning section mentions companies that employ very
| invasive software to find and ban cheaters. Read up on
| how the following softwares actually work:
|
| > There are many facets of cheating in online games which
| make the creation of a system to stop cheating very
| difficult; however, game developers and third-party
| software developers have created or are
| developing[22][23] technologies that attempt to prevent
| cheating. Such countermeasures are commonly used in video
| games, with notable anti-cheat software being BattlEye,
| GameGuard, PunkBuster, Valve Anti-Cheat (specifically
| used on games on the Steam platform),[citation needed]
| and EasyAntiCheat.
| alt227 wrote:
| The problem with what you are saying, is that the
| industry involved is shrouded in secrecy and full of
| smoke and mirrors.
|
| Because the only sources you have provided are anecdotal,
| its entirely possible you are falling for the illusion.
|
| To use your phrases, 'certain games companies' are known
| to totally lie about their anti cheat techniques and
| methods to throw people off the scent of the real
| methods.
|
| Therefore without actually decompiling something to prove
| whats going on, you have no real idea what techniques are
| being used at all.
| cortesoft wrote:
| Those games that shadowban also use anti-cheat software
| to identify the people they need to shadowban.
| aseipp wrote:
| By that point the player base can already be devastated,
| and it can kill the game. Cycle: Frontiers was an
| extraction shooter, and the immediate cheating was rampant;
| the game's design meant dying to hackers was devastating --
| imagine a hacker forcing you to lose not just this game,
| but retroactively making you lose the prior 5 games too.
| This absolutely destroys player morale, near instantly.
| Even if you ban that hacker instantly, there's an extremely
| high chance those players will never return. Because the
| rampant cheating went on for so long, the game's reputation
| never recovered. Within a year of release the servers were
| shut down.
|
| A single cheater can often ruin games for hundreds or even
| thousands of players very easily. For experienced players,
| seeing a single flying aimbot shithead in your lobby just
| means there are 10 other cheaters in the lobby too -- ones
| using subtle ESP/wallhacks that can be extremely difficult
| to detect, by design. Shady websites like G2A or GMG that
| sell keys (which are almost 100% hot keys, to really make
| it all come full circle) mean that even if you get banned,
| buying new keys for a new copy of the game is extremely
| cheap, especially when many of these games have items that
| can be sold for IRL cash in various ways, games like Rust.
| For many parts of the world, selling/trading rare items to
| players can net you plenty of actual income -- and getting
| banned means nothing as a result. Instant banning cheaters
| the second they are confirmed leaks information to the
| cheater and cheat creator, today most games like Warzone or
| Destiny have to play psyops and shroud their exact
| detection techniques in part by doing "ban waves" only when
| they accumulate a mass amount of confirmed cheaters. The
| cheater that ruined your top score may, necessarily and by
| design, be allowed to run free for a while.
|
| The net result of all this is that designers and --
| importantly, even though people on Hacker News don't want
| to hear it -- PLAYERS tend to overwhelmingly prefer
| _prevention_ instead of _reaction_. They are both needed.
| Players are not morons who love installing rootkits. But on
| the whole, preventative measures tend to be more valuable
| to players and creators than reactionary ones, even if they
| are all ultimately imperfect.
|
| In a funny twist, games like Tarkov and Rust do have a
| gameplay mechanic that reduces the long-term psychological
| devastation of cheaters and is not invasive at all: they
| reset all content in the game to "neutral" every once in a
| while, so basically all your stuff gets deleted, and
| everyone starts over again. (This non-permanence is
| probably one of the reasons players stick with the game,
| despite cheaters, which are incredibly infuriating.)
|
| Can I ask if you seriously play any online competitive
| games, at a high level or otherwise? Because I do, and I'll
| be honest: I've been hearing it all for 20 years. These
| types of approaches _have_ had success (CS 's player review
| system, certain shadowban systems, "trip wires" that
| trigger on impossible game behavior), but there is _no_
| single approach that has proven itself to be the ultimate
| universal solution. There is no universal, wibbly wobbly
| bullshit stats algorithm you run on your servers to
| "solve" this. These problems are not solved. I don't like
| it. I don't run certain games with certain forms of
| anticheat. But it is what it is.
| kevincox wrote:
| I think public matchmaking and local play are different. I
| can mostly get behind anti-cheat for public play. But for
| local and optionally for friend games it should be possible
| to extract, edit and play with your saves.
| extraduder_ire wrote:
| It is kind of coming true now, since all current consoles both
| encrypt and sign savegames to lock them to your account, and
| most (don't know about xbox) don't even let you copy your saves
| anywhere but the console and paid cloud storage.
| BoringTimesGang wrote:
| Sad. Today's children will never learn how to generate a
| valid CRC for a hex-edited save file.
| culopatin wrote:
| I was hoping that the exploit would only execute if he stuck the
| landing across the gap
| mclau156 wrote:
| at that point I would rather re-make the game in Godot
| megaloblasto wrote:
| I have a dumb question. Once you soft mod a game console, what
| type of stuff can you do?
| forgotmyacc wrote:
| Back in the day we mostly did it to cheat in Halo 2 online
| multiplayer. I remember being 13 and would stick a butter knife
| into my Xbox DVD drive so the "old maps" (on the disc) would
| fail to load, forcing the game to load a "new map" on the hard
| drive which I patched via soft modding the console allowing FTP
| to edit the maps on the hard drive.
___________________________________________________________________
(page generated 2024-08-08 23:02 UTC)