https://www.fuzzmap.io/

Fuzz Map

Fuzz Map is a fuzzer for GUIs that automatically identifies states
using code coverage and builds a visual map. Ideally, the map is
useful even to people who'd prefer not to read code.

Behind this window is a interactive local demo. Hide this
introduction then click Fuzz to start fuzzing. To show this
introduction again, click the Information button.

Here's a 5-second video of real-time fuzzing on my laptop. Every
state or arrow in the map corresponds to one or more codepaths
discovered by the fuzzer:

The sandbox renders the React code in the editor (specifically, it
renders <App />). Click Compile to update the sandbox, and click the 
Reset button to reset it. The map supports pan and zoom.

The beginning of this writeup will focus on features. If, instead,
you're primarily interested in how Fuzz Map works, you can skip ahead
to the part about how it works. There's some fun trivia, e.g. why a &
& b can't be transformed into (a && (h1, true)) || (h2, b).

By the way, everything in this research prototype runs in your
browser. No data is uploaded to a server.

Live programming

Fuzzing an application might reveal an unexpected state or crash in
the map. After updating the application's code, you can fuzz the
application a second time. Checking the new map helps to verify that
the effects of the change are what you expected.

In Fuzz Map, fuzzing the second time normally goes much faster than
it did the first time around. This is because inputs are saved for
reuse after recompiling. Inputs are typically reusable without manual
annotation because Fuzz Map uses the event handler's inline
declaration to identify it. For example, if you make a button with an
onClick event handler:

function App(...) {
  const handleClick = ...;
  return <button onClick={handleClick} />;
}

then the event handler will be identified using the text
"handleClick". This means that even if the definition of handleClick
is changed, click events will still be reusable. If necessary, you
can disambiguate further by defining the data-fuzzmap-key HTML
attribute.

Errors

Errors can occur while rendering a state or while handling an event.
Fuzz Map detects errors and displays them in the map.

Here's a potential bug in the Checkout example. Suppose an order
contains zero items. The code for computing the order subtotal has a
reduce call:

const subtotal = sortedItems
  .map(([_code, item]) => item.quantity * item.price)
  .reduce((a, b) => a + b, 0);

If the second argument to reduce is forgotten, then removing all of
the items from the order by decreasing their quantities to zero
results in the following map:

[a8ae38d4673b62a50982cb85907ed112fc3f43ae35362eff1499ece5bb591826]

Errors can also occur in event handlers. Inserting an error into
onClickContinue results in this map:

[da5984075f7237f332b0f71504bd0d473137e10237cb9f3c6d7e136f9b8c07a7]

Hover over the error icon to reveal the error:

[0e0ac2fcc00e4c7220878b974893f3875dd87018bca28416e3bb00a821ac0acf]

Before/After

A single event handler can have many different cases, each
corresponding to a distinct codepath. Fuzz Map lists out the cases
for an input in the Before/After view.

Suppose we use the following modified code to handle changes to item
quantities in onChangeQuantity:

const quantity = e.target.valueAsNumber;
const items = new Map(oldItems);
if (quantity === 0) {
  items.delete(code);
} else {
  const item = items.get(code);
  items.set(code, { ...item, quantity });
}
return items;

After fuzzing, there will be an arrow from the initial state to the
initial state with a name like "change 'House Wine'" or "change
'Double-Shot Espresso'". The name could be different because changing
the quantity of any item is handled by the same code.

[93722005dafcd9e57c2acc0a488370dc564d431cbafdb81ea8bfcc3c7a05424e]

Click on the label to switch to the Before/After view, which lists
the different cases for changing an item quantity.

[e3484a93b59571159498cf4c3d2d7df24f1e54e4ec0f1bfe9fe8bb27da73401f]

The modified code doesn't properly handle "Case 2. change 'House
Wine' to ''". When the item's quantity is changed to the empty
string, e.target.valueAsNumber evaluates to NaN, and the subtotal
calculation goes wrong.

Click Show code to highlight the event handler's declaration:

[572fd9bed037944c5788bba8b130196f74ec3e1463f82049e60761ad1f60d312]

A future version of Fuzz Map could use the information it already has
to highlight the lines of code that executed for each case. You could
select two cases to reveal which branches executed differently. This
could be used to debug unexpected states.

How it works

The main idea behind Fuzz Map is to distinguish between GUI states
using code coverage and to render these states as a simplified map.
Coverage-guided fuzzing has been applied with great success for more
than a decade. A complete survey is beyond the scope of this writeup;
see Manes et al. (2018) for an overview, including a fuzzer
genealogy.

A visual map is especially useful for GUI development, where bugs are
often obvious to humans but difficult for computers to check. It's
almost certainly a bug if a program crashes or reads memory it
shouldn't. But what if a button sends the user to the wrong screen,
or the interface just looks strange under unexpected circumstances?
People already use whiteboards or prototyping software to explain an
application's behavior to themselves, or to review it with
colleagues. Fuzz Map tries to create the same kind of map
automatically.

The state graph

While Fuzz Map fuzzes, it builds a state graph. Each node in the
graph represents an application state. Each edge represents an input
processed by an event handler. Every state and event is identified by
a hit vector which describes the branches that executed while the
application rendered or handled the event. The following program
contains two branches:

const name = ...;
if (name !== null) {
  return `Hello, ${name}!`;
} else {
  throw new Error("TODO");
}

If the program runs one time and name is not null, then the hit
vector will be \( (1, 0) \). If the program runs two times, and both
times name is null, then the hit vector will be \( (0, 2) \).

A hit vector is a coarse representation of an application state: many
different states can correspond to the same hit vector. This
simplification is typically useful for Fuzz Map. The actual program
being fuzzed (approximately a Turing machine) is always much more
complicated than the map that Fuzz Map displays (approximately a
finite automaton). Note that this simplification means that it's
possible, albeit unlikely, to have two different states with the same
hit vector where only one exhibits a bug.

Bugs in an application generally correspond to untested codepaths or
interactions between codepaths. Two states that execute very
different code might render the same HTML. But if a program executes
the same code when rendering two screens, then any bug in the code
will generally occur in both cases. Because Fuzz Map uses hit vectors
to identify states, it will only show two states in the map for the
previous example: one where name is non-null, and another where name
is null.

Random fuzzing

The state graph helps Fuzz Map to explore states efficiently. The
graph keeps track of the states the fuzzer has seen and the shortest
paths between them. Instead of building a graph, Fuzz Map could
randomly generate long sequences of inputs. This is surprisingly
inefficient. Efficiency is especially important for fuzzing GUIs,
which process inputs much more slowly than typical fuzzer targets
like parsers or protocols.

To illustrate how random fuzzing can be inefficient, consider an
application with a sequence of screens, e.g. for checking out items
from an online store. In screen \( 1 \), colored blue, you can only
go to the next state, and in screens \( 2 \) through \( n - 1 \) you
can go to either the previous or next screen.

[f32cdc0b6d8f4045de8e2a9ef36292f9d79a20975f8288c3e2b7b547e660d922]

How long will it take a random fuzzer to find the final screen \( n
\), colored orange? If this is the coupon collector's problem, we'd
expect it to take around \( n \log n \) inputs. But because each
screen can only be accessed through its adjacent screens, this is
actually an instance of the gambler's ruin problem. It will take
random fuzzing around \( n^2 \) inputs to find the final orange
screen! And the structure of a real GUI is normally even worse for
fuzzers. Most GUIs also have many cycles and dead ends; these only
make it more likely for the fuzzer to get stuck.

Because Fuzz Map uses code coverage to recognize states it has
already seen, it only takes around \( n \) inputs in this case.

For the math behind the \( n^2 \) estimate, see angryavian's answer
to an equivalent problem on Math StackExchange. For the general case,
see the chapter "Random Walk and Ruin Problems" in An Introduction to
Probability Theory and Its Applications, Vol. 1 by Feller. The
amazing Internet Archive has an online copy you can borrow, where the
analysis starts on page 317.

By the way, if you think it's inconvenient that only one person can
borrow the book per hour, book publishers disagree--they think it
should be even harder! At the time of writing, Hachette,
HarperCollins, Wiley, and Penguin Random House are suing the Internet
Archive. The Internet Archive is being defended by the EFF.

Exploring the state graph

Whenever Fuzz Map first sees a state, identified by its hit vector \(
\vec s \), it generates all the possible inputs for that state. For
example, one input \( i \) might be "change the value of 'Pickup
time' to '1:24 AM'". Then it places all of these inputs into a
fuzzing queue. The items in the queue are pairs \( (\vec s, i) \).

Fuzzing takes place in a loop. In each iteration, Fuzz Map dequeues a
pair \( (\vec s, i) \) then applies it. If \( \vec s \) does not
match the current state, then Fuzz Map travels before applying \( i
\). It does this by applying the shortest sequence of inputs it
previously recorded as having led to \( \vec s \). A minified set of
these shortest paths is also saved for reuse after recompiling to
make subsequent fuzzing runs faster. This set is comparable to the
seed pool of a conventional fuzzer.

Fuzz Map uses a few simple scheduling heuristics for dequeuing items.
When possible, Fuzz Map dequeues an item whose state equals the
current state. This is because it is typically more expensive to
reset a GUI than to apply an additional input. Otherwise, Fuzz Map
will alternate between selecting a random item from the queue and
selecting an item with the smallest hit vector.

Fuzz Map dequeues a random item rather than always dequeuing the next
item to to try avoid getting stuck in a clique of states that all
lead to each other. Dequeuing the item with the smallest hit vector
helps to expose behavior that occurs only when a loop does not
execute.

These are just heuristics, and future versions of Fuzz Map could be
much more sophisticated. Consider an application which only shows
some state if password == "please". Fuzz Map knows through branch
coverage that isn't guessing correctly, but it doesn't currently
analyze the program to determine what the correct password is. A more
realistic example for a GUI might be an event handler that checks
whether event.touches.length == 2. One technique for generating
inputs using program analysis is concolic testing.

Branch coverage

To record branch coverage and obtain hit vectors for nodes (render
states) and edges (event handlers), Fuzz Map uses compile-time
instrumentation.

Tangentially, Fuzz Map does some things with compile-time
instrumentation which could just as well be done at by patching the
application runtime, e.g. associating event handlers with their
elements. It would have been feasible to patch React or the browser,
but it will be much harder to patch other platforms' runtimes. iOS is
an obvious example, but another scenario is a corporate environment
where you are only allowed to deploy JARs.

Instrumentation is applied in a compiler pass. When Fuzz Map
encounters this code:

return isLoading ? "Loading..." : "Done!";

it adds instrumentation and outputs the following:

return isLoading ? (hit(65), "Loading...") : (hit(66), "Done!");

The comma denotes a sequence expression. When the expression a, b is
evaluated, first a is evaluated, then b. Finally, the entire
expression evaluates to the result of b. Here it's used to insert the
hit counter into the ternary expression without changing its value.
There are weirder tricks for this which will appear later in this
section.

hit(n) increments the hit counter with ID n in a global map. The hit
counter IDs in this example are \( 65 \) and \( 66 \) because Fuzz
Map packs both a branch ID and a target ID into the hit counter ID.
The branch ID in this case is \( 1 \), and the second branch target
has ID \( 2 \), so \( 66 = (1 \ll 6) + 2 \). Fuzz Map also uses the
most significant bit to record whether a hit counter ID corresponds
to a "high detail" branch; currently these are just function entry
points. The extra information packed into the hit counter ID is used
during map graph generation, discussed later.

[cd9227aba935fd66f6fd308fc78b4e856ca35da68df16b0c2128f91303032949]

Short-circuiting expressions

You probably know that the Boolean operators in JavaScript
short-circuit and evaluate to the first sub-expression that
determines their value, as they do in most (or all?) languages in the
C family. Short-circuiting means that false && alert("boo!") and true
|| alert("boo!") never alert. But unless you are familiar with React
and JSX, you might not know that short-circuiting is often used
idiomatically. Here's some code from the Checkout example:

{screen === "OrderConfirmed" && (
  <>
    <h2>Order confirmed</h2>
    <div className="screenContents">
      <p>Your order is confirmed!</p>
      <p>{STORE_NAME} is preparing your order.</p>
      ...
    </div>
  </>
)}

When the "Order Confirmed" screen is not being shown, screen ===
"OrderConfirmed" evaluates to false, so the entire expression
evaluates to false. React doesn't render false, so the screen doesn't
show.

The frequent use of short-circuiting in React made it more important
for this demo to instrument it. If the demo worked on C programs
compiled to WASM instead, this might have been less important.

As an aside, WASM would have made some things easier. Instead of
having to write the instrumentation for each branching operation
separately, I could have just instrumented the less numerous control
instructions in WASM. Rewriting a WASM binary is a lot easier than
rewriting e.g. an x86 binary because WASM does not allow jumps to
arbitrary addresses; for the difficulties that creates, see e.g.
Bauman et al (2018). More generally, it would be nifty to have the
ability to write a compiler pass that operates on a lower-level
representation, and to then have the compiler automatically transform
this pass into one that operates on a higher-level representation,
preserving the original structure of the program whenever possible. I
don't know of other applications for this, though; maybe it could be
used to implement diff/patch not based on lines.

Because we don't have this ability, here are the separate
transformations for a || b and a ?? b (the nullish coalescing
operator):

  * a || b becomes

    ((x) => x ? (hit(1), x) : (hit(2), b))(a)

  * a ?? b becomes

    ((x) => (x === null || x === undefined) ? (hit(2), b) : (hit(1), x))(a)

Any expression can be replaced by an immediately-invoked function
expression containing a conditional expression. It's necessary
because we need to bind the value of the expression a so we can use
it multiple times while evaluating it only once. For example, to
evaluate a ?? b, first we compare the value of a with null and
undefined, then we return it.

I previously tried to be clever and transform a || b to (a && (h1,
true)) || (h2, b). This works unless a is only truthy, not true, and
the result of the expression is not used as a Boolean. Then the
transformed expression evaluates to true when the original expression
would have evaluated to e.g. 1.

If this section didn't bore you out of your mind, you might enjoy the
game return true to win by @steike.

The map graph

The state graph is already a simplified version of the program being
tested. But typically it'll still be humongous, even for a simple
application. When rendered it is almost unreadable. So before layout
and rendering, Fuzz Map simplifies the state graph into a map graph.
Each node or edge in the map graph corresponds to one or more nodes
and edges in the state graph. As a final step, Fuzz Map uses elkjs to
create a layout where each node and edge in the map graph has been
assigned a position on the screen.

Why is the state graph so large? First, there are many ways that
different parts of the program can interact with each other. If one
branch is added to the program that is independent from the other
branches in the program, it more than doubles the number of possible
states. In other words, even when each entry of the hit vector is
clamped to \( [0, 1] \), a program with \( n \) branches still has \(
2^n \) possible hit vectors.

Second, each iteration of a loop increments each hit counter it
contains. So each loop iteration creates an entirely new state.

During fuzzing, it's useful for the state graph to be large. Recall
that each item \( (\vec s, i) \) in the fuzzing queue contains a hit
vector \( \vec s \) that identifies the state to which the input \( i
\) should be applied. So the more precisely the hit vector \( \vec s
\) describes the application state, the more likely we are to
successfully apply input \( i \) when we travel to that state again
during fuzzing. Suppose we used clamped hit vectors to identify
application states. Then we might try to apply an input \( i \) that
only works in states where a list has 3 elements to a state where
that list has only 2 elements.

It's still possible for us to fail to apply an input \( i \). A hit
vector \( \vec s \) might correspond to two states, and \( i \) might
apply in only one of them. What if we instead identified each state
using the application's actual memory, instead of using hit vectors?
This would guarantee that \( i \) would apply, but might make the
fuzzer slower. Obviously, we would use much more memory to represent
each state.

More subtly, we would apply many redundant operations during fuzzing.
Suppose the application behaves exactly the same no matter what a
user's name is, so long as they have a name. Then unless we have a
way to determine that the difference between two states with
different names is insignificant, every different name multiplies the
number of items in the fuzzing queue.

Simplification

The map graph is created by combining nodes and edges in the state
graph. This process repeats until no more combinations are possible.
Fuzz Map uses a few simple heuristics for simplification. The
simplest--but most impactful--is clamping: each entry of a hit vector
is clamped to at most \( 1 \). Another heuristic, branch collapsing,
is to treat two branch targets as identical when we've seen a state
where both were hit. Branch collapsing helps with GUIs which perform
processing for each item in a list, like the Checkout example.

This diagram shows an example of clamping during simplification. The
two blue nodes in the state graph are combined into one node in the
map graph:

[b37bd9d1850bdf516755d5a40a6d6b266ac9d139f787d01976316aef905dc7fe]

Initially the two blue nodes have different hit vectors, \( (1, 0, 2)
\) and \( (3, 0, 1) \). But both hit vectors are equivalent after
clamping to \( (1, 0, 1) \), so the two nodes are combined. This
causes the edge from one blue node to the other to become a loop.
Simplification operates on edges as well.

There's a lot more that could be said about map graph simplification.
In fact, simplification was the largest source of novelty and
complexity in this project. Without simplification, interesting parts
of the map are drowned out by exponentially more uninteresting parts.
Simplification is the key to making the map visualization possible.

Clamping is essentially local in that it acts on each hit vector
independently. Branch collapsing is slightly less local. The next
step is to explore global analyses, e.g. identifying when a state
differs significantly from others.

Applying inputs

Changing tack: it's often possible for Fuzz Map to apply inputs even
after the program has been modified. The input paths that Fuzz Map
uses to identify input targets contain the event handler's inline
declaration. For example, if you render a list of buttons:

function App(...) {
  ...
  return list.map((name) => <button onClick={handleClick(name)} />);
}

each button's onClick handler gets an input path:

{ subtree: ?, eventName: 'click', handler: 'App_handleClick_name', index: 1 }

where index: 1 means this handler was the second among its siblings
in the DOM with this handler. Fuzz Map uses the event handler's
inline declaration because it's already common practice not to define
event handlers entirely inline. This means most input paths will be
stable without any extra work.

Fuzz Map was designed to work without manual annotation whenever
feasible. But if you need to more precisely identify an element, you
can set the data-fuzzmap-key HTML attribute on the element or its
ancestors. The subtree field in the input path is a list of (key,
index) pairs. If you want, you can give each element a totally unique
data-fuzzmap-key. Many testing systems either require this for all
interactive elements or fall back to guessing based on style or
content.

A heuristic is used to minimize the size of the seed input set: if
one seed input sequence is exactly contained in another, then the
smaller one is removed. For many applications, it is more expensive
to reset the sandbox than to extend an existing input sequence with
additional inputs, e.g. when resetting the sandbox requires resetting
a database. A more optimal approach would be to compute a minimal
path cover.

Limitations and future work

There are many limitations in the current version of Fuzz Map--hence
the alpha label. I'll mention a few of them below. All of these will
be addressed in future versions.

Fuzz Map does not generate complex values for inputs. This is the
exact opposite of most commonly-used fuzzers! That is also exactly
why this wasn't a focus of the demo. Instead of reinventing the
wheel, a future version of Fuzz Map will integrate established
fuzzers for generating e.g. the text in inputs.

Fuzz Map's state and event model is extensible. The demo does not
currently handle asynchronous events like setTimeout and fetch calls.
But in a future version of Fuzz Map, edges will correspond not just
to DOM event handlers, but also to event handlers attached to browser
APIs. It will be straightforward to extend Fuzz Map to instrument and
replay these in the same way that it already instruments DOM events.
Here's a quick mockup:

[315ef928645072563116f42c9654d6ac5299d05a732c370d2f45c34b860a07a6]

Fuzz Map does not handle event handlers that are defined at runtime.
Only controlled components are fuzzed.

I was lazy and didn't handle more exotic branching operations like ??
= and ?..

Special acknowledgments

I'd like to especially acknowledge Joey Liaw, who gave me the idea to
try and fuzz GUIs over lunch!

I'd also like to especially acknowledge the authors of elkjs and the
ELK layout library. Of all the libraries I used for this demo, ELK
punched the most above its weight. ELK generates the layout used to
render the map graph. A similar program is the classic Graphviz.

First published August 25, 2022.
(c) 2022 Jonathan Y. Chan (jyc)
Copyright acknowledgments