[HN Gopher] ExifTool CVE-2021-22204 - Arbitrary Code Execution (...
       ___________________________________________________________________
        
       ExifTool CVE-2021-22204 - Arbitrary Code Execution (2021)
        
       Author : ekiauhce
       Score  : 89 points
       Date   : 2024-01-27 11:58 UTC (11 hours ago)
        
 (HTM) web link (devcraft.io)
 (TXT) w3m dump (devcraft.io)
        
       | fulafel wrote:
       | Perl and eval, not what I'd have expected from the title!
        
         | rkagerer wrote:
         | Indeed! Arbitrary text was smuggled into the eval() statement
         | through a weak input sanitization step:
         | 
         |  _The second quote was not escaped because in the regex $tok =~
         | /(\\\\+)$/ the $ will match the end of a string, but also match
         | before a newline at the end of a string, so the code thinks
         | that the quote is being escaped when it's escaping the
         | newline._
         | 
         | This is why I hate using regexes, or deciphering other people's
         | code which uses them. The syntax just isn't obvious. We need
         | tooling that _removes_ mental overhead, not adds to it. I 've
         | always found plain old procedural parsing code easier to
         | formulate, trace and reason about.
         | 
         | I'm not some some fanatic preaching they should be banned from
         | all languages, but I do feel there are places they're
         | inappropriate and input sanitization is one of them. It's also
         | good practice to centralize this sort of logic in some flavor
         | of escape() function to reduce the whack-a-mole when bugs like
         | this are found.
        
           | jodrellblank wrote:
           | > " _will match the end of a string, but also match before a
           | newline at the end of a string._ "
           | 
           | Is it just me, or is this a poor description? I understood $
           | to be an anchor for end of _line_ , not end of _string_ ,
           | i.e. "match before a newline anywhere". The emphasis "newline
           | _at the end of a string_ " feels misleading; the exploit
           | works because the matched newline isn't at the end of the
           | string and there is exploit code after it.
           | 
           | > " _I 've always found plain old procedural parsing code
           | easier to formulate, trace and reason about._"
           | 
           | A regex is a domain specific language which lets you express
           | a lot of computation in a small amount of code. At the
           | extreme, you're saying "I can write a regex engine easier
           | than I can write a regex". "I can write assembler easier than
           | I can write C". "I can write a list of a thousand items
           | quicker than I can write range(1000)". It doesn't make sense,
           | and for anything more than trivial cases, I don't believe it.
           | 
           | (I'll agree that there are places they are inappropriate,
           | expressions which aren't clear, and maybe the extra effort of
           | doing it by hand is worth it in security sensitive
           | situations, but I claim it is extra effort and less clear
           | what a pile of ifs/loops/switch is trying to achieve vs
           | "[a-f]{3}(\d)" or whatever).
        
             | joosters wrote:
             | The meaning of ^ and $ in perl regexes can be altered by
             | modifiers at the end of the regex. A 'multi-line' regex,
             | with a /m at the end, makes them match the start and end of
             | any line.
        
               | jwilk wrote:
               | The code doesn't use /m. So indeed the problem is that $
               | will match before \n at the end of the string _the Perl
               | interpreter is working with_ , which is not the whole
               | thing containing the payload.
        
               | jodrellblank wrote:
               | I've just worked through it to understand it; it's a more
               | subtle exploit than I thought. My above comment about $
               | being end-of-line is wrong, the above comment saying you
               | need the /m modifier for that to happen is correct.
               | 
               | In case anyone cares, the bug matches <backslash>
               | <newline> <quote> then the regex in question matches
               | <backslash> <newline> and the code logic is that one
               | backslash must be escaping the next character from the
               | input, but instead of adding on the next character _it
               | always adds a quote_ - after searching for a quote the
               | next character must be a quote, right? But it wasn 't, it
               | was a newline, which was missed because of $ behaviour
               | with newline at the end of a string. That acts to shift
               | the quote one to the left <backslash> <quote> <newline>
               | and now that makes an escaped quote which won't break
               | eval() and the loop carries on and reads in the exploit
               | text up to the real end quote.
               | 
               | ----
               | 
               | Details: the Perl code takes this pattern in the input
               | (the quotes are part of the input, in the file data):
               | "a\         ""
               | 
               | six characters, describing a quoted string of four
               | characters: <a> <backslash> <newline> <quote>
               | 
               | The Perl code finds the string starting quote and moves
               | past it, and sets $tok empty to hold the quoted string
               | content. Then it searches for the next quote (not the
               | last, the next):                   last Tok unless
               | $$dataPt =~ /"/sg;
               | 
               | This will match at the quote after the newline. Then it
               | substrings from the saved opening quote position to 1
               | before the found quote. So the substring includes the
               | newline char:                   # the closing quote
               | position. (not including the quote).         $tok .=
               | substr($$dataPt, $pos, pos($$dataPt)-1-$pos);
               | 
               | That gets <a> <backslash> <newline> and not the following
               | <quote> <quote>
               | 
               | Then it does the odd-number-of-backslashes test on the
               | substring <a> <backslash> <newline>:
               | last unless $tok =~ /(\\+)$/ and length($1) & 0x01;
               | 
               | And the regex matches for <backslash> <newline> instead
               | of the intended <backslash> ENDOFSTRING so the code
               | thinks there is an escaped character. It doesn't _add the
               | next character from the string into the token_ , it
               | _assumes the escaped character must be a quote_ and
               | _always adds a quote_ to $tok.                   $tok .=
               | '"';    # quote is part of the string
               | 
               | Effectively shifting the input quote to the other side of
               | the newline, from:                   "a\         ""
               | 
               | to                   "a\"         "
               | 
               | and in the exploit case:                   "a\
               | "exploit code"
               | 
               | to                   "a\"         exploit code"
               | 
               | Then the inner loop runs again and finds the <exploit
               | code to closing quote> text and adds that on to $tok. Now
               | there's a string with an escaped quote and some exploit
               | code.
        
       | rixrax wrote:
       | So how do I know if there is bug bounty available for
       | vulnerabilities in exiftool? Or ghostscript? Or ffmpeg, openssl,
       | gnutls, sox, or any number of other packages I may be using?
        
         | IshKebab wrote:
         | They're open source and written by volunteers so there almost
         | certainly isn't. Unless it's covered by this:
         | 
         | https://bughunters.google.com/about/rules/6521337925468160/g...
        
           | richbell wrote:
           | There's also https://www.hackerone.com/internet-bug-bounty
        
         | andersa wrote:
         | For open source libraries, you are usually better off finding a
         | large company that uses it in some exposed way, then submitting
         | it to their bug bounty.
         | 
         | Sometimes you can even collect the bounty multiple times by
         | sending it to multiple companies, so long as the first one
         | doesn't submit the fix before the second even looks at the
         | report...
        
         | TheDong wrote:
         | There is at least one bug bounty available for all open source
         | projects.
         | 
         | I run a bug bounty where if you tell me about the high level
         | details of a bug and it sounds interesting, I will buy you a
         | drink (up to $20, so that's the bounty) in exchange for you
         | telling me all the details.
         | 
         | All applications for my bug bounty program must be in person,
         | feel free to take me up on it!
        
       | voytec wrote:
       | Note "2021"
        
       | ptx wrote:
       | This was fixed in version 12.24 (back in 2021) according to the
       | version history page[0], but the current version still uses
       | "eval" in several places[1]. This seems like an unnecessarily
       | dangerous approach - wouldn't it have been a good idea to fix all
       | the instances in the codebase after the bug was discovered?
       | 
       | [0] https://exiftool.org/ancient_history.html#v12.24
       | 
       | [1]
       | https://github.com/search?q=repo%3Aexiftool%2Fexiftool+%2Fev...
        
         | fulafel wrote:
         | Indeed. Also I wonder if Gitlab is still using this.
         | 
         | Gitlab self-hosted users who didn't patch got bitten with in
         | the wild exploitation:
         | https://www.rapid7.com/blog/post/2021/11/01/gitlab-unauthent...
        
           | mdaniel wrote:
           | Seems the answer is yes: https://gitlab.com/search?search=exi
           | ftool&nav_source=navbar&...
           | 
           | https://gitlab.com/gitlab-
           | org/gitlab/-/blob/v16.8.1-ee/lib/g... and
           | https://gitlab.com/gitlab-
           | org/gitlab/-/blob/v16.8.1-ee/lib/g...
        
         | amiga386 wrote:
         | From TFA:
         | 
         | > In Perl, eval can be used with a block to trap exceptions
         | which is why it was being used everywhere.
         | 
         | ...which your search doesn't seem to exclude (I'm not signing
         | up to Github to find out so I'm just guessing from your URL
         | that you didn't exclude "eval {" or "eval q{")
         | 
         | It does still use eval on occasion and they seem happy with it.
         | 
         | * for their own controlled expressions to customise base
         | parsing behaviour (which could be refactored to use function
         | references instead, but it's not a case of evaluating external
         | data)
         | 
         | * to test if modules can be imported (which you've excluded
         | from your search)
         | 
         | and in one case in CanonRaw.pm, using eval on an expression
         | that matches [0-9./]+ , I'm not sure it looks at external data
         | but it might, and at best you could cause a divide-by-zero
         | error (e.g. 1/0) or syntax error (e.g. ///) if you mess with
         | the data it parses.
         | 
         | So overall, not much dangerous eval going on.
        
           | fulafel wrote:
           | Catching exceptions from eval-ing untrusted code doesn't
           | strike me as being a big help from security POV, or is there
           | something else special about the block form of eval that
           | helps here?
        
             | _flux wrote:
             | Block form is not intended to run untrusted code, it is
             | used similar to how exception handling is used in some
             | languages:                 perl -e 'eval { die "aiee" };
             | print "problem: $@"'
        
           | mdaniel wrote:
           | The non-GH search version: https://sourcegraph.com/search?q=c
           | ontext:global+repo:%5Egith...
           | 
           | Your observation is proper in that it did not exclude the
           | patterns you mentioned, but in this specific case not
           | necessary because all 4 results are shaped the exact same
           | way:                   lib/Image/ExifTool/Exif.pm
           | #### eval Start ($valuePtr, $val)                         my
           | $newStart = eval($$subdir{Start});
           | unless (Image::ExifTool::IsInt($newStart)) {
           | #### eval Base ($start,$base)
           | $subdirBase = eval($$subdir{Base}) + $base;
           | }              lib/Image/ExifTool/MakerNotes.pm
           | #### eval Start ($valuePtr)                     $newStart =
           | eval($$subdir{Start});                 }
           | #### eval Base ($start,$base)                     my
           | $baseShift = eval($$subdir{Base});                     #
           | shift directory base (note: we may do this again below
           | #### eval OffsetPt ($valuePtr)
           | $ifdOffsetPos = eval($$subdir{OffsetPt}) - $dirStart;
           | }
        
             | ptx wrote:
             | You're right that excluding "eval {" is not necessary
             | because it only occurs in the already excluded subset, but
             | that search link is missing a lot of matches since the
             | space character in my pattern has turned into a plus sign
             | in yours. Try this: https://sourcegraph.com/search?q=contex
             | t%3Aglobal+repo%3A%5E...
             | 
             | There are also some cases (excluded in this search) where a
             | charset parameter is passed around and eventually passed to
             | eval in the LoadCharset function, e.g. from the RTF parser
             | through the Recompose method. Not sure if that's always
             | safe.
        
       | Jaxan wrote:
       | Exiftool is one of those open source tools which provides a lot
       | of features just for free. It's really amazing that people can
       | work on something like this for so long.
        
       | Levitating wrote:
       | The "Investigation" box at Hack The Box includes this
       | vulnerability.
        
       | 1vuio0pswjnm7 wrote:
       | I prefer Exiv2 over ExifTool. I can compile it statically. Does
       | not require a Perl installation.
        
         | lucb1e wrote:
         | I would be more interested in a "what I like about it" sentence
         | over a preemptively pacifying "to each their own" tbh. The
         | tool's own description doesn't compare against exiftool either,
         | besides that it sounds like it does only images and not, e.g.,
         | documents
         | 
         | > Exiv2 is a C++ library and a command-line utility to read,
         | write, delete and modify Exif, IPTC, XMP and ICC image
         | metadata.
        
       ___________________________________________________________________
       (page generated 2024-01-27 23:00 UTC)