Tools for fuzzing liboqs #983
jschanck
started this conversation in
Show and tell
Replies: 1 comment
-
Looks very interesting John! Does the coverage report indicate that there are large segments of code that we're not exercising? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've written a tool for coverage-guided differential fuzzing of liboqs. It's still a work in progress, but it's now available in my fuzzing branch.
Let's break down "coverage-guided differential fuzzing".
Coverage-guided: It tries to construct a set of "inputs" that cover a large percentage of the code. (LLVM's
libFuzzer does all the heavy lifting here.)
Differential: If we have two implementations of the same scheme, the fuzzer will try to cross-validate them by comparing their outputs on the same "input".
The main goal here is to produce a set of "inputs" that cover more code paths than the Known Answer Tests. We can then use those "inputs" in CI, perhaps with our ASan, UBSan, and Valgrind tests.
"Inputs"?
By "inputs" I do not mean inputs to our public API, hence the scare quotes.
Fuzzing is interesting when it explores more code paths than random testing. Our public API doesn't give the fuzzer much of a chance to do this. To see why, consider a typical implementation of
OQS_KEM_encaps
:public key
as input.seed
from the system random source.seed
intosession key
andcoins
using SHAKE.session key
topublic key
usingcoins
to derandomize some probabilistic encryption scheme.The fuzzer might find some interesting public keys, but it's not going to do better than random search when looking for bugs that depend on
session key
andcoins
. Even if we let the fuzzer chooseseed
, the call to SHAKE on line 3 severely limits the fuzzer's influence over these other variables.The new tool recognizes that "SHAKE" in line 3 can be replaced with any other random oracle. In particular, we're free to replace it with a random oracle that answers queries with fuzzer-provided data. Doing so gives the fuzzer direct control over
session key
andcoins
.So by "inputs" I actually mean a list of answers to the random oracle queries.
How it works
There are a lot of functions that get used as random oracles in liboqs. Here's the short list:
The new tool loads a small library before liboqs to override these functions (i.e. it uses the ``LD_PRELOAD trick''). This library answers queries to each of the random oracles with fuzzer-provided data, and it keeps track of all the queries that are made so that it can answer consistently.
The fuzz targets just run through a set of keygen/encaps/decaps or keygen/sign/verify operations.
For differential fuzzing we overload the runtime CPU feature detection mechanism (again with the LD_PRELOAD trick). We can then toggle between optimized and reference code by lying to liboqs about the CPU features that are available.
Compilation
You can also use
to compile liboqs with address sanitizer.
Or
to compile liboqs with undefined behaviour sanitizer.
Usage
From the liboqs directory with a compiled library in ./build:
Make a directory to store coverage data in
Pick a fuzz target
Create a directory in which to store the corpus of "inputs"
mkdir -p ./corpus/${TARGET}/
And run the fuzzer
Realistically you want to loop over all targets:
Note the large value of max_len. We want the fuzzer to provide as many bytes as we consume in random oracle answers.
See the libfuzzer guide for description of the other options.
https://llvm.org/docs/LibFuzzer.html#options
Coverage
You can generate a coverage report by running
Remaining issues:
Beta Was this translation helpful? Give feedback.
All reactions