[HN Gopher] Arbitrary Code Execution in the Universal Turing Mac...
       ___________________________________________________________________
        
       Arbitrary Code Execution in the Universal Turing Machine
        
       Author : ingve
       Score  : 114 points
       Date   : 2021-05-10 09:15 UTC (13 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | mabbo wrote:
       | Truly _nothing_ is secure.
        
       | schwag09 wrote:
       | > In computing, the relationship between structure and behavior,
       | between program and process, is perplexing in itself. That this
       | relationship so often can be subverted, allowing an untrusted
       | data provider to preternaturally gain control over program
       | execution, is disquieting. Why is this a common phenomenon in
       | computer systems?
       | 
       | Mixing code and content leads to all the most common injection-
       | style vulnerability classes: buffer overflows, XSS, SQLi, command
       | injection, etc. Fundamentally, I believe this is at the heart of
       | the problem. The paper does go on to address this:
       | 
       | > Another proposed theory blames John von Neumann's stored
       | program concept [15] for the woes of arbitrary code execution;
       | the fact that data and program in computers are stored on the
       | same storage medium may allow an attacker to illegitimately
       | modify the program rather than the intended data [10].
       | 
       | From there it provides ROP as a counterpoint to the "stored-
       | program hypothesis." This makes sense because ROP allows one to
       | achieve arbitrary code execution without necessarily modifying
       | the code. Although, while I agree that mixing code and content
       | may not be the highest level abstraction for concern here, I do
       | think it's often a fundamental flaw from a practical perspective.
        
         | ukj wrote:
         | Your fundamental flaw is my prized feature.
         | 
         | Metaprogramming/reflection would be impossible without treating
         | code as content.
        
           | schwag09 wrote:
           | Let me rephrase, I think it's often a fundamental flaw from a
           | practical, security perspective. Reflection and constructs
           | like 'eval' are often at odds with security. You could more
           | generally say that utility is often at odds with security.
           | 
           | Maximizing utility on a website might look like dropping a
           | user into a root REPL so they can perform arbitrary actions
           | and are unlimited in their capabilities. Maximizing security
           | might look like shutting down the website and all its
           | associated servers. In reality, from a security perspective,
           | we try to achieve a balance between the two by maximizing
           | utility while minimizing security risk.
           | 
           | Personally, I think code as content, reflection, eval, etc
           | move the slider too far in the utility direction. Of course
           | this is a difficult problem to solve because nearly all our
           | systems are built on the idea of code as content, which is
           | also what makes it such a generalized vulnerability class.
        
           | lisper wrote:
           | And this tension in goals is not limited to software.
           | Modifying and repairing hardware is similarly fraught. You
           | can get yourself into all kinds of trouble by tinkering with
           | (say) your car.
           | 
           | The real difference is that you generally have to be in
           | physical proximity to hardware in order to tinker with it,
           | and that puts a very hard constraint on random people mucking
           | with your car. Not so for software, especially in the age of
           | the internet.
        
           | lucian1900 wrote:
           | Not at all, even in Lisps there's quoting. In-band signalling
           | isn't necessary for either meta programming or reflection.
        
       | dkarras wrote:
       | CVE?
        
         | darkr wrote:
         | https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3247...
        
           | SilasX wrote:
           | lol at how it has:
           | 
           | >>NOTE: the discoverer states "this vulnerability has no
           | real-world implications."
        
           | dkarras wrote:
           | Ah, of course.
        
       | philipswood wrote:
       | A computer running a program usually does not embody a universal
       | machine anymore.
       | 
       | I like to think of an arbitrary code execution as workaround to
       | turning it into a universal machine again.
       | 
       | This puts quite a spin on the idea.
        
       | motohagiography wrote:
       | This is oddly important. From the conclusion:
       | 
       | > While this vulnerability has no real-world implications, we
       | discuss whether it indicates an intrinsic propensityfor arbitrary
       | code execution vulnerabilities in computers ingeneral.
       | 
       | As a joke in the 90's I looked into writing malware for
       | Napoleonic semaphore signalling networks (Chappe semaphore), with
       | the idea that if it were possible, then nothing could be secure.
       | We have some general ideas that nothing is secure, and that
       | security is a relative quality and perception, but I don't know
       | that we've been able to formalize it into a conjecture. (Other
       | than Hofstadter's music to break phonographs by.)
       | 
       | Maybe I'm missing a big obvious one, but there's something about
       | programmable systems that they are necessarily unable to infer or
       | deduce the output of a program beyond a certain complexity
       | without first running it. Or maybe that there is a set of
       | convolutions in a set of instructions whereafter the information
       | it contains spirals off into chaos like the logistic equation and
       | you can't formally predict how a program behaves based on its
       | initial state. There's something about security problems that
       | exist on the edge of computability and it seems like a
       | perspective we could draw generalities from anyway.
        
         | chriswarbo wrote:
         | > there's something about programmable systems that they are
         | necessarily unable to infer or deduce the output of a program
         | _beyond a certain complexity_
         | 
         | Just to focus on this idea of 'a certain complexity', you may
         | like Chaitin's Incompleteness Theorem:
         | https://en.wikipedia.org/wiki/Kolmogorov_complexity#Chaitin'...
         | which defines the "algorithmic information content" of a
         | symbolic system (whether that's a programming language or a
         | mathematical logic; which are actually the same thing via the
         | Curry-Howard correspondence!)
         | 
         | The theorem is a bit technical, but basically involves a
         | program which enumerates all strings of symbols from a chosen
         | alphabet (pretty easy: just a `while` loop that tries longer
         | and longer string lengths, and a `for` loop which goes through
         | the alphabet over and over to output every combination of that
         | many symbols).
         | 
         | We use this to define a program which enumerates all proofs of
         | some chosen formal logic (these are just strings of symbols
         | which obey all of that logic's rules, so we just filter the
         | output of the above program).
         | 
         | We use this to define a program which enumerates proofs of
         | theorems of the form "all programs which output string s
         | contain at least N instructions" (this is just a pattern-match
         | against the end of the proof). N is a parameter, whilst s can
         | be any string (different proofs can have different strings s).
         | 
         | We use this to define a final program, which we'll call P.
         | Running P with a parameter, like P(N), will run the above
         | enumeration for that value of N, and halt once a single proof
         | has been found. Just before halting, P will output whichever
         | string s appears in the proof it found.
         | 
         | Now is the interesting part: we count up all of the
         | instructions used to define all of the above programs. Let's
         | call this U (as per the Wikipedia link above). We'll define
         | another number N0 where N0 > U + the number of instructions
         | required to define N0 in a program; in other words, N0 is more
         | than the number of instructions needed to define 'P(N0)'. (Keep
         | in mind that we can generate very large numbers using programs
         | with only a few instructions, so it's easy enough to find a a
         | suitable value of N0).
         | 
         | Finally, consider what happens if we run the program P(N0):
         | 
         | If P(N0) halts, it will output some particular string s. That
         | string is taken from a proof of the theorem "all programs which
         | output string s contain at least N0 instructions". We know that
         | program P(N0) contains fewer than N0 instructions, which would
         | contradict such a theorem. The only way to avoid this
         | contradiction (other than using an inconsistent logic) is if
         | P(N0) _never_ halts.
         | 
         | If P(N0) never halts, that means it will keep enumerating
         | proofs forever without _ever_ finding one for _any_ theorem of
         | the form  "all programs which output string s contain at least
         | N0 instructions"; for _any_ string s.
         | 
         | This latter result is consistent, but shows that our chosen
         | system of logic (whatever it is), must be _severely_ limited.
         | In particular, we know it is unable to prove statements of the
         | form  "all programs which output string s contain at least N
         | instructions" where N >= N0 (in fact, there's a lower bound on
         | N0 based on 'compressing' the above programs into the fewest
         | number of instructions; although finding that lower bound is
         | uncomputable due to the halting problem!).
         | 
         | At first glance this doesn't look very important, but consider
         | the pigeonhole principle: there are only a finite number of
         | programs with fewer than N0 instructions; hence there are only
         | a finite number of strings 's' which can _provably_ be
         | generated with fewer than N0 instructions. Almost all strings
         | 's' require more than N0 instructions to generate.
         | 
         | Now consider programs of the form 'enumerate all proofs until
         | we find one for statement T, then output T and halt'. If such a
         | program contains more than N0 instructions, then our chosen
         | logic is _unable_ to prove whether this program will halt or
         | not. We _can_ prove that whether or not this program halts is
         | equivalent to whether or not T is provable, and hence the only
         | way for this halting /looping question to be unprovable is for
         | the statement T to be unprovable.
         | 
         | Hence almost all statements are unprovable by our given system
         | of logic, regardless of which logic we choose!
         | 
         | Chaitin takes this further, to define a "halting probability",
         | and essentially shows that all mathematical 'knowledge' can be
         | distilled down to set of which bits, which tell us whether
         | certain programs halt or not.
        
         | AndrewOMartin wrote:
         | Forgive me if I'm wrong, but this video might suggest you were
         | beaten to it by about 160 years!
         | 
         | https://www.youtube.com/watch?v=cPeVsniB7b0
        
           | motohagiography wrote:
           | So edifying! Deserves its own post really. They encoded data
           | in a covert channel using backspaces, which is super
           | interesting. The next level was looking for a way to change
           | the contents of a message and potentially a code book update.
        
         | chriswarbo wrote:
         | > Maybe I'm missing a big obvious one, but there's something
         | about programmable systems that they are necessarily unable to
         | infer or deduce the output of a program beyond a certain
         | complexity without first running it.
         | 
         | Yep, this is Rice's Theorem. If we have a particular program P
         | which outputs a certain value, there is an infinite set of
         | programs of the form 'C; P;' which first run some arbitrary
         | code 'C' and then run P. The Halting Problem is undecidable, so
         | we can't always tell whether code 'C' will halt or not; and
         | hence whether we'll move on to executing P or not.
         | 
         | More generally, Rice's Theorem applies to any non-trivial
         | property of the input/output function implemented by a program.
         | A non-trivial property means that at least one program has that
         | property and at least one program doesn't. Being a property of
         | the 'input/output function implemented by a program' avoids
         | implementation details like 'runs for at least 10 steps'.
         | 
         | The Halting Problem is _semi-decidable_ : if we just run the
         | given program until it halts then return 'HALTS', we will
         | always return the correct answer when given a program which
         | halts. This doesn't completely solve the Halting Problem, since
         | we will not return 'NON-HALTING' when given a non-halting
         | program (we will just run forever).
         | 
         | Wolfram calls this "computational irreducibility": that the
         | only way to predict the behaviour of a Turing-complete system
         | is to simulate it; or, equivalently, any formula that tells us
         | what will happen on the Nth step will, necessarily, involve
         | calculating what happens on the (N-1)th step, the (N-2)th step,
         | and so on back to the first step.
        
           | wcarss wrote:
           | "O Great Univac, why did you ever create the universe?"
           | "Someone asked if it would ever end."
        
         | umanwizard wrote:
         | > there's something about programmable systems that they are
         | necessarily unable to infer or deduce the output of a program
         | beyond a certain complexity without first running it.
         | 
         | Rice's theorem is a formal statement capturing what I think you
         | mean here.
        
           | motohagiography wrote:
           | https://en.wikipedia.org/wiki/Rice%27s_theorem
           | 
           | Thank you, that's the kind of example I was thinking must
           | exist. It implies there is a program for each system that is
           | undecidable, and then the question becomes how many of those
           | programs transfer control so that the program rewrites
           | itself. Maybe we don't need to do this in security, but
           | sometimes it feels like the whole field is predicated on
           | denying some foundational truth about complexity.
        
             | michaelt wrote:
             | Isn't the statement that "there exists a program for each
             | system such that arbitrary code execution is possible" a
             | tautology?
             | 
             | Aren't you basically saying "Executing arbitrary code can
             | allow executing arbitrary code"?
        
               | motohagiography wrote:
               | To a point, as there's no difference between authorized
               | and unauthorized code, except there is, but there
               | isn't.:)
               | 
               | Maybe more like wondering what the consequences are if
               | all code is arbitrary, particularly for higher order
               | abstractions like security and economics.
               | 
               | Security has always been a hall of mirrors, and a
               | principle like this could be a fast heuristic for
               | deciding how much risk you want to take. In all the
               | security design work I've done, it was based on, "we've
               | essentially made an attack cost greater than X, where X
               | is close to the value of what they're protecting."
        
           | sterlind wrote:
           | how does this apply to symbolic execution of programs? do you
           | think it's possible to exploit a proof verifier through a
           | maliciously-crafted set of lemmas?
        
             | maweki wrote:
             | Either your decision/deduction procedure terminates on all
             | inputs. Then the property you're going after is trivial. Or
             | it doesn't and then the property you're going after is
             | generally undecidable.
             | 
             | If you can symbolically evaluate a program and this always
             | works for all input programs, your input programs are
             | already restricted in that they are not computationally
             | complete.
        
               | chriswarbo wrote:
               | Going a bit further, this could absolutely be used for a
               | denial-of-service attack; but would otherwise be
               | 'sandboxed', similar to how exploits for a Universal
               | Turing machine are still limited to reading/writing the
               | machine's tape. They can't, for example, hijack an SSH
               | session or whatever (unless there are serious
               | vulnerabilities in the prover/turing-machine
               | implementation)
        
         | spiritplumber wrote:
         | GNU Terry Pratchett
        
         | Y_Y wrote:
         | > "You know they'll never really die while the Trunk is alive
         | [...] It lives while the code is shifted, and they live with
         | it, always Going Home." - Moist von Lipwig, Going Postal,
         | Chapter 13
         | 
         | This is a bit like the clacks network (inspired by semaphore)
         | in the Discworld, and how you can spam it with everlasting
         | messages, to memorialise the dead, or just gunk up the system.
        
       | [deleted]
        
       | galaxyLogic wrote:
       | When we say a program is "insecure" what do we mean? Do we mean
       | that there is some class of inputs which cause it to behave in a
       | way that it will do any arbitrary computation the designers of
       | the input-data want it to execute?
       | 
       | Or less critically that the designers of the inputs can cause the
       | outputs of the program reveal some information which can be used
       | to implement further exploits?
        
         | luch wrote:
         | Depends on the situation, which is why CVE also have a CVSS
         | score explaining the context and potential for abuse, e.g. :
         | https://msrc.microsoft.com/update-guide/vulnerability/CVE-20...
         | 
         | For example an arbitrary computation is useless for browser
         | exploits (unless you think arb. bitcoin mining is a problem)
         | since Javascript already gives you it for free but it is highly
         | problematic for smart contracts since it breaks the economy
         | based on it.
         | 
         | Another example would be data leak vulnerabilities which on
         | some cases be only a stepping stone towards a RCE, but on
         | others scenarios like for Heartbleed basically breaks the
         | Internet's overall security.
        
       | legulere wrote:
       | Arbitrary computation is just a subset of arbitrary code
       | execution and one that is much less harmful.
       | 
       | Arbitrary computation means that you can let the computer do any
       | pure calculation you want it to. The worst you can do is to put
       | it in an endless loop. Which you can break with timeouts.
       | 
       | As Turing machines are idealized machines that only do
       | computation, that's all that can happen to them and why they are
       | completely useless models for security.
       | 
       | Real machines, processes interact with the outside world.
       | Arbitrary code execution is dangerous exactly because it can
       | access those interaction possibilities. It doesn't matter that
       | the injected code could calculate PI to trillions of digits, but
       | it matters that it could do anything the process could do, namely
       | manipulating files and accessing the network.
        
         | ComodoHacker wrote:
         | >The worst you can do is to put it in an endless loop.
         | 
         | You can alter the result of computation, which could be much
         | worse, depending on how it's used further.
        
       | danbruc wrote:
       | In summary, the universal Turing machine uses a non-binary
       | alphabet but is only meant to simulate Turing machines with a
       | binary alphabet given a description of the Turing machine in a
       | defined format and its input. By making the initial tape content
       | of the simulated Turing machine non-binary, the universal Turing
       | machine can be made to execute a further Turing machine contained
       | in the input of the simulated Turing machine. Whether this works
       | also depends on the description of the simulated Turing machine
       | but the authors claim that most machine descriptions are
       | exploitable but without any further discussion of this.
       | 
       | A possible fix would probably be to simply halt when the
       | Universal Turing machine encounters something other than zero or
       | one in the region of the tape that represents the tape of the
       | simulated Turing machine except for the symbol indicting the head
       | of the simulated Turing machine.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-05-10 23:02 UTC)