https://obrhubr.org/reverse-engineering-diploma

Home
About
RSS Feed
Contact me!
Reverse Engineering the Verification QR Code on my Diploma
Published on 21st Jun 2024.
Security Research Reverse Engineering Cryptography
You can find the sourcecode at: github.com/obrhubr/
reverseengineering-diploma
---------------------------------------------------------------------

At 15:50 Eastern European Time on the 18th of June 2024, I walked out
of my last exam, having finally finished school after 13 years. This
also meant that I could get into programming again, having been
forced to quit my work in favour of studying.

Three days later, I was sent a PDF with the grades I received in my
exams. What stood out to me however, was the QR code in the top
right. This it what the PDF looks like. (And no this isn't the real
one. In fact, go ahead and scan the QR code, I dare you.)

diplome

The only clue to the code's function is a small text below
referencing CycladesVerif, a mobile application. To satisfy my
curiosity I downloaded it and scanned the code. I then got a summary
of my personal information and my grades. I would guess the code is
so universities or employers could verify that you didn't tamper with
the PDF in order to boost your grades. They provide an example of
what it looks like after scanning a QR code on the Play Store:

example

Now I am sure there's a question you're all asking yourselves: Can we
reverse engineer this and more importantly, can we break it?

My first naive Attempts

This isn't the first time I see a QR code that allows verification of
data. The Green Pass App that was ubiquitous during Covid allowed
authorities or event staff to scan your code to check for up to date
Covid tests or proof of vaccination.

These usually work by base64 encrypting the data and appending a
signature at the end. In case you are a web developer this works
similarly to JSON Web Tokens. The signature is a hash of the data
encrypted with an RSA private key. When the user scans the QR code
the signature is decrypted into the original hash, which is then
compared to a new hash of the data sent in the QR code. If both
hashes match, you can be sure there hasn't been any tampering with
the contents. If they don't, you know you can't trust the data.

As I have at least a little bit of trust in the French authorities'
knowledge of computer security, I assumed they would not only store
the data about the user, but also sign it.

When I scanned the code to extract the data, what I discovered was a
simple base64 encoded string (which I knew because it ended in ==)
exactly 258 bytes long. But my excitement didn't last long as the
result of decoding it was random bytes that didn't seem to contain
any useful data.

Discovering the atrocities committed to create the QR code

After asking my friend to send me his PDF too, I discovered that the
code was also 258 bytes long. For anyone with a little knowledge
about RSA encryption, this does not bode well. But at that time I was
blissfully ignorant and assumed I did something wrong - I thought I
may have chosen the wrong character set - and it would decode just
fine after fiddling with it a bit.

But when I was still left with illegible garbled data, I knew that I
would have to reverse engineer the app itself to discover how it
accessed the data.

But there was one more thing to try first: Did the app access the
internet to fetch the diploma? This would tell me if there was hope
for a simple and quick solution. After putting my phone in airplane
mode I could still access the data through the app. This was good
news, as it confirmed that both the data and the keys to decrypt it
had to be in the app itself. But actually accessing them would be
another challenge.

Disassembling the App

The first thing I did was head to decompiler.com where I uploaded the
APK for the app. Armed with the little knowledge of Android apps I
had, I headed straight to the sources folder, in search of useful
Java code.

And Java code I found, but it was mostly gibberish. Anyone slightly
familiar with Java knows that what you're looking for is the
MainActivity. But what i discovered in sources/fr/edu/rennes/cylades/
mobile/verifcertif/CycladesVerif/MainActivity.java - yes that's a lot
of folders - was an empty file:

package fr.edu.rennes.cyclades.mobile.verifcertif.CycladesVerif;

import io.flutter.embedding.android.i;

public class MainActivity extends i {
}

After a bit of googling I realised something, which should be obvious
if you took the time to read the name of the imported package: the
app was using Flutter. And after a bit more googling (well a lot
more) and 50 Reddit threads later, I knew to look for a file called
libapp.so. It should have the (compiled) Dart code for the app, which
would mostly be unreadable to humans.

Armed with Ctrl+F I tried to dissect the madness. I was looking for
something specific, which was public or private keys and mentions of
RSA or other similar encryption schemes. My first search for public
yielded the gem ----BEGIN RSA PUBLIC KEY----- between a sea of
garbage method names and pointer references.

This meant my suspicions were mostly confirmed. The app uses an RSA
public key to decrypt the scanned data and extract the diploma. But
to investigate further I would have to get more readable source code.
Luckily, a friend of mine who recently finished his course on
computer security at university and has published a paper in the
field was able to use Ghidra to disassemble the Dart code into
assembly.

Digging through Assembly Code

After a lot of searching, I finally found a file that seemed to be
responsible for the app's main functionality in asm/CycladesVerif/
screens/scan.dart. It had the ----BEGIN RSA PUBLIC KEY----- string
again and also contained method names like _decodeWithRSAToken.

After a day or two of familiarizing myself with the syntax and
different instructions that make up the provided assembly code, I was
able to piece together the steps to reproduce what was happening.

 1. First the app calls the function scanBarcode() to get the data
    from the QR code

 2. Then, it compares the first two digits to find out which version
    the document has. It can be either 01, 02, 03 or 04. These two
    digits account for the first two of the 258 bytes. This means
    that the encrypted data itself is 256 bytes long. This is exactly
    the length of the result of encrypting something using a RSA
    2048-bit key.

 3. Depending on the result of the comparison, it uses a different
    method and a different public key to decrypt it. My diploma
    starts with 02 and therefore the apps calls
    _decodeRlnWithRSAToken2022().

 4. This function first creates a PEM version of the RSA token -
    which means it prepends the ----BEGIN RSA PUBLIC KEY----- and
    appends the -----END RSA PUBLIC KEY----- suffix to the key. Then
    it decodes the QR code with base64 and then uses the Dart package
    pub.dev/packages/pointycastle to decrypt it with the RSA public
    key and PKCS#1 padding. The resulting bytes are processed with a
    Latin1 decoder - Latin1 is a type of text character set, like
    UTF-8.

 5. The decrypted and decoded text is then passed to
    _transformTextToReleveNotes() and parsed using a Regex that
    extracts the data and displays it to the user.

After wrestling with the openssl command line tool and invoking a few
arcane options, I managed to coax it into correctly decrypting the
bytes. The result of the decryption looks like this:

openssl rsautl -verify -inkey key.pem -pubin -in data.bin -raw -hexdump

0000 - 00 01 ff ff ff ff ff ff-ff ff ff ff ff ff ff ff   ................
0010 - ff ff ff ff ff ff ff ff-ff ff ff ff ff ff ff ff   ................
0020 - ff ff ff ff ff ff ff ff-ff ff ff ff ff ff ff ff   ................
0030 - ff ff ff 00 42 61 63 63-61 6c 61 75 72 e9 61 74   ....Baccalaur.at
0040 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX    g.n.ral session
0050 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX    2024|XXXXYYYYY|
0060 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   Niklas|XXXXXX|10
0070 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   .00|Admis Mentio
0080 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   n Tr.s Bien avec
0090 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX    les f.licitatio
00a0 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   ns du jury||Epre
00b0 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   uve orale termin
00c0 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   ale (Grand oral)
00d0 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   ~10.00#Math.mati
00e0 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   ques~10.00#Physi
00f0 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   que-Chimie~10.00

From which we can extract the following text, using the Latin1
character set:

Baccalaureat general session 2024|XXXXXXX|Niklas|010101|10.00|Admis Mention Tres Bien avec les felicitations du jury||Epreuve orale terminale (Grand oral)~10.00#Mathematiques~10.00#Physique-Chimie~10.00

The assembly code shows that it is then parsed with the following
regex, which extracts the characters between the pipes |.

(.*)\|(.*)\|(.*)\|([0-9]{2}[0-9]{2}[0-9]{2})\|(.*)\|(.*)\|(.*)\|(.*)

Finally, in the _transformTextToReleveNotes() function, the grades at
the end (...Mathematiques~10.00#Physique-Chimie~10.00) are separated
by at the hashtags # and then the tildes ~ to parse and display them
to the user.

If you are interested in scanning your own diploma, src/
load_diploma.py is a Python program that recreates the functionality
of this app.

So what's the issue here?

The first issue is the absolute disregard for any of the standards
related to RSA key usage. Encrypting with the private key and
decrypting with the public key is usually only done in the context of
signing/verifying. This is consistent with the \x00\x01 at the
beginning of the decrypted message's padding. The \x01 indicates a
block type 1 in PKCS#1 padding, which means the operation performed
was signing.

Python's cryptography libraries straight out refuse to do any
decrypting with public keys, the only available methods are for
verifying. This requires you to provide the signature and the
message, which in this case is not possible since the signature is no
hash of the message, but the message itself. Thus I had to resort to
the obscure openssl commands and in the case of my Python program,
had to reimplement the RSA algorithm in Python to get useful results.

If that wasn't already bad enough, they are also using PKCS#1 v1.5,
which is considered unsafe due to padding oracle attacks and it is
recommended to use OAEP padding instead.

Can we break it then?

The bad news is that it's probably not possible. But there are a few
approaches that I evaluated anyways, in order to test their
feasibility.

The Bleichenbacher Padding Oracle Attack

According to owasp.org it is possible to use a padding oracle attack
to decrypt an RSA PKCS#1 key encrypted message, as described by
Bleichenbacher in his paper. As I understand it, recovering the
plaintext is possible if you have access to an oracle that tells you
if the encrypted message corresponds to a message with valid padding,
because this allows one to narrow the search interval of encrypted
messages iteratively. Check out a great demo here.

What we need in this case is the inverse of this attack. We could
then forge an encrypted message, which when the app decrypts it with
the public key, shows any diploma we want. This attack was presented
by Daniel Bleichenbacher in 2006, as detailed in this great blog post
. But sadly, this is only possible for RSA keys with an exponent of e
=3 , which is not the case for our app, as the key's exponent is e=
65537.

This means that we cannot use this method to forge any diploma.

Generating a Diploma that the App would Display

In order for the App to actually show data instead of exiting, we
need the decrypted text to match the regex. The simplest possible
text which is valid is ||||123456||| because it has the 8 | pipe
separated sections and the birthday. The pipes don't actually have to
have any text between them, as .* matches any character between zero
and unlimited times.

This means that we can calculate how many possible messages there
are, that match this regex.

If there are 243 bytes of padding (\x00\x01\xff... 238 more times ...
\xff\x00), then we have $ 10^6 $ valid messages. This is because
every string ||||XXXXXX||| - where X is any digit - matches.
Therefore we have 6 characters with 10 possibilities (0-9) which
means $ 10^6 $ total possibilities.

For every byte less of padding, there can be one more character of
random text in between the pipes - except between the numbers,
because it has to exactly 6 digits. For example Y||||123456||| and ||
Y||123456|||, where Y is any character, are both valid messages. And
since the app uses the Latin1, there are 189 valid characters.

To calculate the amount of possibilities for $ n $ chars (so $ 256 -
13 - n $ bytes of padding) we have to consider how many ways we can
put the $ n $ chars in the spaces between the pipes. This is solved
by the classic combinatorial formula called the stars and bars
theorem. It calculates the number of ways to place $ n $
indistinguishable balls into $ k $ labelled urns.

We can use python to sum all possible combinations for us:

import math

def distinct_distributions(n, x):
    # Using the combinatorial formula for stars and bars
    return math.comb(n + x - 1, x - 1)

total_possibilities = 0
for n in range(0, 256-13):
        total_possibilities  += (189**n) * distinct_distributions(n, 7)
total_possibilities  *= 10**6

But we want to know if there is any chance for us to generate one
such message by checking all possible encrypted ciphertexts
iteratively. The total number of existing ciphertexts is $ 256^{256}
$, because they have to be exactly 256 bytes long. By diving $ 256^
{256} $ by the amount of messages which match the regex, we know how
many we would have to try on average, before finding a single valid
one.

minimum_checks = 256**256 // total_possibilities
print(f"How many decryptions to perform before finding one which validates the regex: {t}")
# How many decryptions to perform before finding one which validates the regex:
# 1319814855853530981336893039920992872421595910518

That is a lot, sadly. To calculate the necessary time to test all of
these, I use the speed of a C program I wrote that creates a
ciphertext, decrypts it and tests for matches. It can perform 100k
checks in 2.39 seconds.

speed = 100000 / 2.39
print(f"Checks per second {v}")
# Checks per second
# 41841.00418410041

time = (minimum_checks / speed) / 60 / 60 / 24
print(f"Days to find one result: {time }")
# Days to find one result:
# 3.650876742465208e+38

Unfortunately $ 3 \cdot 10^{38} $ days is a completely unrealistic
timespan, so we will sadly have to surrender to the French...