[HN Gopher] Arbitrary Code Execution in the Universal Turing Mac...
___________________________________________________________________
Arbitrary Code Execution in the Universal Turing Machine
Author : ingve
Score : 114 points
Date : 2021-05-10 09:15 UTC (13 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| mabbo wrote:
| Truly _nothing_ is secure.
| schwag09 wrote:
| > In computing, the relationship between structure and behavior,
| between program and process, is perplexing in itself. That this
| relationship so often can be subverted, allowing an untrusted
| data provider to preternaturally gain control over program
| execution, is disquieting. Why is this a common phenomenon in
| computer systems?
|
| Mixing code and content leads to all the most common injection-
| style vulnerability classes: buffer overflows, XSS, SQLi, command
| injection, etc. Fundamentally, I believe this is at the heart of
| the problem. The paper does go on to address this:
|
| > Another proposed theory blames John von Neumann's stored
| program concept [15] for the woes of arbitrary code execution;
| the fact that data and program in computers are stored on the
| same storage medium may allow an attacker to illegitimately
| modify the program rather than the intended data [10].
|
| From there it provides ROP as a counterpoint to the "stored-
| program hypothesis." This makes sense because ROP allows one to
| achieve arbitrary code execution without necessarily modifying
| the code. Although, while I agree that mixing code and content
| may not be the highest level abstraction for concern here, I do
| think it's often a fundamental flaw from a practical perspective.
| ukj wrote:
| Your fundamental flaw is my prized feature.
|
| Metaprogramming/reflection would be impossible without treating
| code as content.
| schwag09 wrote:
| Let me rephrase, I think it's often a fundamental flaw from a
| practical, security perspective. Reflection and constructs
| like 'eval' are often at odds with security. You could more
| generally say that utility is often at odds with security.
|
| Maximizing utility on a website might look like dropping a
| user into a root REPL so they can perform arbitrary actions
| and are unlimited in their capabilities. Maximizing security
| might look like shutting down the website and all its
| associated servers. In reality, from a security perspective,
| we try to achieve a balance between the two by maximizing
| utility while minimizing security risk.
|
| Personally, I think code as content, reflection, eval, etc
| move the slider too far in the utility direction. Of course
| this is a difficult problem to solve because nearly all our
| systems are built on the idea of code as content, which is
| also what makes it such a generalized vulnerability class.
| lisper wrote:
| And this tension in goals is not limited to software.
| Modifying and repairing hardware is similarly fraught. You
| can get yourself into all kinds of trouble by tinkering with
| (say) your car.
|
| The real difference is that you generally have to be in
| physical proximity to hardware in order to tinker with it,
| and that puts a very hard constraint on random people mucking
| with your car. Not so for software, especially in the age of
| the internet.
| lucian1900 wrote:
| Not at all, even in Lisps there's quoting. In-band signalling
| isn't necessary for either meta programming or reflection.
| dkarras wrote:
| CVE?
| darkr wrote:
| https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3247...
| SilasX wrote:
| lol at how it has:
|
| >>NOTE: the discoverer states "this vulnerability has no
| real-world implications."
| dkarras wrote:
| Ah, of course.
| philipswood wrote:
| A computer running a program usually does not embody a universal
| machine anymore.
|
| I like to think of an arbitrary code execution as workaround to
| turning it into a universal machine again.
|
| This puts quite a spin on the idea.
| motohagiography wrote:
| This is oddly important. From the conclusion:
|
| > While this vulnerability has no real-world implications, we
| discuss whether it indicates an intrinsic propensityfor arbitrary
| code execution vulnerabilities in computers ingeneral.
|
| As a joke in the 90's I looked into writing malware for
| Napoleonic semaphore signalling networks (Chappe semaphore), with
| the idea that if it were possible, then nothing could be secure.
| We have some general ideas that nothing is secure, and that
| security is a relative quality and perception, but I don't know
| that we've been able to formalize it into a conjecture. (Other
| than Hofstadter's music to break phonographs by.)
|
| Maybe I'm missing a big obvious one, but there's something about
| programmable systems that they are necessarily unable to infer or
| deduce the output of a program beyond a certain complexity
| without first running it. Or maybe that there is a set of
| convolutions in a set of instructions whereafter the information
| it contains spirals off into chaos like the logistic equation and
| you can't formally predict how a program behaves based on its
| initial state. There's something about security problems that
| exist on the edge of computability and it seems like a
| perspective we could draw generalities from anyway.
| chriswarbo wrote:
| > there's something about programmable systems that they are
| necessarily unable to infer or deduce the output of a program
| _beyond a certain complexity_
|
| Just to focus on this idea of 'a certain complexity', you may
| like Chaitin's Incompleteness Theorem:
| https://en.wikipedia.org/wiki/Kolmogorov_complexity#Chaitin'...
| which defines the "algorithmic information content" of a
| symbolic system (whether that's a programming language or a
| mathematical logic; which are actually the same thing via the
| Curry-Howard correspondence!)
|
| The theorem is a bit technical, but basically involves a
| program which enumerates all strings of symbols from a chosen
| alphabet (pretty easy: just a `while` loop that tries longer
| and longer string lengths, and a `for` loop which goes through
| the alphabet over and over to output every combination of that
| many symbols).
|
| We use this to define a program which enumerates all proofs of
| some chosen formal logic (these are just strings of symbols
| which obey all of that logic's rules, so we just filter the
| output of the above program).
|
| We use this to define a program which enumerates proofs of
| theorems of the form "all programs which output string s
| contain at least N instructions" (this is just a pattern-match
| against the end of the proof). N is a parameter, whilst s can
| be any string (different proofs can have different strings s).
|
| We use this to define a final program, which we'll call P.
| Running P with a parameter, like P(N), will run the above
| enumeration for that value of N, and halt once a single proof
| has been found. Just before halting, P will output whichever
| string s appears in the proof it found.
|
| Now is the interesting part: we count up all of the
| instructions used to define all of the above programs. Let's
| call this U (as per the Wikipedia link above). We'll define
| another number N0 where N0 > U + the number of instructions
| required to define N0 in a program; in other words, N0 is more
| than the number of instructions needed to define 'P(N0)'. (Keep
| in mind that we can generate very large numbers using programs
| with only a few instructions, so it's easy enough to find a a
| suitable value of N0).
|
| Finally, consider what happens if we run the program P(N0):
|
| If P(N0) halts, it will output some particular string s. That
| string is taken from a proof of the theorem "all programs which
| output string s contain at least N0 instructions". We know that
| program P(N0) contains fewer than N0 instructions, which would
| contradict such a theorem. The only way to avoid this
| contradiction (other than using an inconsistent logic) is if
| P(N0) _never_ halts.
|
| If P(N0) never halts, that means it will keep enumerating
| proofs forever without _ever_ finding one for _any_ theorem of
| the form "all programs which output string s contain at least
| N0 instructions"; for _any_ string s.
|
| This latter result is consistent, but shows that our chosen
| system of logic (whatever it is), must be _severely_ limited.
| In particular, we know it is unable to prove statements of the
| form "all programs which output string s contain at least N
| instructions" where N >= N0 (in fact, there's a lower bound on
| N0 based on 'compressing' the above programs into the fewest
| number of instructions; although finding that lower bound is
| uncomputable due to the halting problem!).
|
| At first glance this doesn't look very important, but consider
| the pigeonhole principle: there are only a finite number of
| programs with fewer than N0 instructions; hence there are only
| a finite number of strings 's' which can _provably_ be
| generated with fewer than N0 instructions. Almost all strings
| 's' require more than N0 instructions to generate.
|
| Now consider programs of the form 'enumerate all proofs until
| we find one for statement T, then output T and halt'. If such a
| program contains more than N0 instructions, then our chosen
| logic is _unable_ to prove whether this program will halt or
| not. We _can_ prove that whether or not this program halts is
| equivalent to whether or not T is provable, and hence the only
| way for this halting /looping question to be unprovable is for
| the statement T to be unprovable.
|
| Hence almost all statements are unprovable by our given system
| of logic, regardless of which logic we choose!
|
| Chaitin takes this further, to define a "halting probability",
| and essentially shows that all mathematical 'knowledge' can be
| distilled down to set of which bits, which tell us whether
| certain programs halt or not.
| AndrewOMartin wrote:
| Forgive me if I'm wrong, but this video might suggest you were
| beaten to it by about 160 years!
|
| https://www.youtube.com/watch?v=cPeVsniB7b0
| motohagiography wrote:
| So edifying! Deserves its own post really. They encoded data
| in a covert channel using backspaces, which is super
| interesting. The next level was looking for a way to change
| the contents of a message and potentially a code book update.
| chriswarbo wrote:
| > Maybe I'm missing a big obvious one, but there's something
| about programmable systems that they are necessarily unable to
| infer or deduce the output of a program beyond a certain
| complexity without first running it.
|
| Yep, this is Rice's Theorem. If we have a particular program P
| which outputs a certain value, there is an infinite set of
| programs of the form 'C; P;' which first run some arbitrary
| code 'C' and then run P. The Halting Problem is undecidable, so
| we can't always tell whether code 'C' will halt or not; and
| hence whether we'll move on to executing P or not.
|
| More generally, Rice's Theorem applies to any non-trivial
| property of the input/output function implemented by a program.
| A non-trivial property means that at least one program has that
| property and at least one program doesn't. Being a property of
| the 'input/output function implemented by a program' avoids
| implementation details like 'runs for at least 10 steps'.
|
| The Halting Problem is _semi-decidable_ : if we just run the
| given program until it halts then return 'HALTS', we will
| always return the correct answer when given a program which
| halts. This doesn't completely solve the Halting Problem, since
| we will not return 'NON-HALTING' when given a non-halting
| program (we will just run forever).
|
| Wolfram calls this "computational irreducibility": that the
| only way to predict the behaviour of a Turing-complete system
| is to simulate it; or, equivalently, any formula that tells us
| what will happen on the Nth step will, necessarily, involve
| calculating what happens on the (N-1)th step, the (N-2)th step,
| and so on back to the first step.
| wcarss wrote:
| "O Great Univac, why did you ever create the universe?"
| "Someone asked if it would ever end."
| umanwizard wrote:
| > there's something about programmable systems that they are
| necessarily unable to infer or deduce the output of a program
| beyond a certain complexity without first running it.
|
| Rice's theorem is a formal statement capturing what I think you
| mean here.
| motohagiography wrote:
| https://en.wikipedia.org/wiki/Rice%27s_theorem
|
| Thank you, that's the kind of example I was thinking must
| exist. It implies there is a program for each system that is
| undecidable, and then the question becomes how many of those
| programs transfer control so that the program rewrites
| itself. Maybe we don't need to do this in security, but
| sometimes it feels like the whole field is predicated on
| denying some foundational truth about complexity.
| michaelt wrote:
| Isn't the statement that "there exists a program for each
| system such that arbitrary code execution is possible" a
| tautology?
|
| Aren't you basically saying "Executing arbitrary code can
| allow executing arbitrary code"?
| motohagiography wrote:
| To a point, as there's no difference between authorized
| and unauthorized code, except there is, but there
| isn't.:)
|
| Maybe more like wondering what the consequences are if
| all code is arbitrary, particularly for higher order
| abstractions like security and economics.
|
| Security has always been a hall of mirrors, and a
| principle like this could be a fast heuristic for
| deciding how much risk you want to take. In all the
| security design work I've done, it was based on, "we've
| essentially made an attack cost greater than X, where X
| is close to the value of what they're protecting."
| sterlind wrote:
| how does this apply to symbolic execution of programs? do you
| think it's possible to exploit a proof verifier through a
| maliciously-crafted set of lemmas?
| maweki wrote:
| Either your decision/deduction procedure terminates on all
| inputs. Then the property you're going after is trivial. Or
| it doesn't and then the property you're going after is
| generally undecidable.
|
| If you can symbolically evaluate a program and this always
| works for all input programs, your input programs are
| already restricted in that they are not computationally
| complete.
| chriswarbo wrote:
| Going a bit further, this could absolutely be used for a
| denial-of-service attack; but would otherwise be
| 'sandboxed', similar to how exploits for a Universal
| Turing machine are still limited to reading/writing the
| machine's tape. They can't, for example, hijack an SSH
| session or whatever (unless there are serious
| vulnerabilities in the prover/turing-machine
| implementation)
| spiritplumber wrote:
| GNU Terry Pratchett
| Y_Y wrote:
| > "You know they'll never really die while the Trunk is alive
| [...] It lives while the code is shifted, and they live with
| it, always Going Home." - Moist von Lipwig, Going Postal,
| Chapter 13
|
| This is a bit like the clacks network (inspired by semaphore)
| in the Discworld, and how you can spam it with everlasting
| messages, to memorialise the dead, or just gunk up the system.
| [deleted]
| galaxyLogic wrote:
| When we say a program is "insecure" what do we mean? Do we mean
| that there is some class of inputs which cause it to behave in a
| way that it will do any arbitrary computation the designers of
| the input-data want it to execute?
|
| Or less critically that the designers of the inputs can cause the
| outputs of the program reveal some information which can be used
| to implement further exploits?
| luch wrote:
| Depends on the situation, which is why CVE also have a CVSS
| score explaining the context and potential for abuse, e.g. :
| https://msrc.microsoft.com/update-guide/vulnerability/CVE-20...
|
| For example an arbitrary computation is useless for browser
| exploits (unless you think arb. bitcoin mining is a problem)
| since Javascript already gives you it for free but it is highly
| problematic for smart contracts since it breaks the economy
| based on it.
|
| Another example would be data leak vulnerabilities which on
| some cases be only a stepping stone towards a RCE, but on
| others scenarios like for Heartbleed basically breaks the
| Internet's overall security.
| legulere wrote:
| Arbitrary computation is just a subset of arbitrary code
| execution and one that is much less harmful.
|
| Arbitrary computation means that you can let the computer do any
| pure calculation you want it to. The worst you can do is to put
| it in an endless loop. Which you can break with timeouts.
|
| As Turing machines are idealized machines that only do
| computation, that's all that can happen to them and why they are
| completely useless models for security.
|
| Real machines, processes interact with the outside world.
| Arbitrary code execution is dangerous exactly because it can
| access those interaction possibilities. It doesn't matter that
| the injected code could calculate PI to trillions of digits, but
| it matters that it could do anything the process could do, namely
| manipulating files and accessing the network.
| ComodoHacker wrote:
| >The worst you can do is to put it in an endless loop.
|
| You can alter the result of computation, which could be much
| worse, depending on how it's used further.
| danbruc wrote:
| In summary, the universal Turing machine uses a non-binary
| alphabet but is only meant to simulate Turing machines with a
| binary alphabet given a description of the Turing machine in a
| defined format and its input. By making the initial tape content
| of the simulated Turing machine non-binary, the universal Turing
| machine can be made to execute a further Turing machine contained
| in the input of the simulated Turing machine. Whether this works
| also depends on the description of the simulated Turing machine
| but the authors claim that most machine descriptions are
| exploitable but without any further discussion of this.
|
| A possible fix would probably be to simply halt when the
| Universal Turing machine encounters something other than zero or
| one in the region of the tape that represents the tape of the
| simulated Turing machine except for the symbol indicting the head
| of the simulated Turing machine.
| [deleted]
___________________________________________________________________
(page generated 2021-05-10 23:02 UTC)