Tools for fuzzing liboqs #983

jschanck · 2021-04-30T21:56:30Z

jschanck
Apr 30, 2021
Collaborator

I've written a tool for coverage-guided differential fuzzing of liboqs. It's still a work in progress, but it's now available in my fuzzing branch.

Let's break down "coverage-guided differential fuzzing".

Coverage-guided: It tries to construct a set of "inputs" that cover a large percentage of the code. (LLVM's
libFuzzer does all the heavy lifting here.)

Differential: If we have two implementations of the same scheme, the fuzzer will try to cross-validate them by comparing their outputs on the same "input".

The main goal here is to produce a set of "inputs" that cover more code paths than the Known Answer Tests. We can then use those "inputs" in CI, perhaps with our ASan, UBSan, and Valgrind tests.

"Inputs"?

By "inputs" I do not mean inputs to our public API, hence the scare quotes.

Fuzzing is interesting when it explores more code paths than random testing. Our public API doesn't give the fuzzer much of a chance to do this. To see why, consider a typical implementation of OQS_KEM_encaps:

Receive public key as input.
Get a 32 byte seed from the system random source.
Expand seed into session key and coins using SHAKE.
Encrypt session key to public key using coins to derandomize some probabilistic encryption scheme.

The fuzzer might find some interesting public keys, but it's not going to do better than random search when looking for bugs that depend on session key and coins. Even if we let the fuzzer choose seed, the call to SHAKE on line 3 severely limits the fuzzer's influence over these other variables.

The new tool recognizes that "SHAKE" in line 3 can be replaced with any other random oracle. In particular, we're free to replace it with a random oracle that answers queries with fuzzer-provided data. Doing so gives the fuzzer direct control over session key and coins.

So by "inputs" I actually mean a list of answers to the random oracle queries.

How it works

There are a lot of functions that get used as random oracles in liboqs. Here's the short list:

OQS_AES128_ECB_enc
OQS_AES256_CTR_sch
OQS_AES256_ECB_enc
OQS_SHA2_sha256
OQS_SHA2_sha384
OQS_SHA2_sha512
OQS_SHA3_sha3_256
OQS_SHA3_sha3_512
OQS_SHA3_shake128
OQS_SHA3_shake256

The new tool loads a small library before liboqs to override these functions (i.e. it uses the ``LD_PRELOAD trick''). This library answers queries to each of the random oracles with fuzzer-provided data, and it keeps track of all the queries that are made so that it can answer consistently.

The fuzz targets just run through a set of keygen/encaps/decaps or keygen/sign/verify operations.

For differential fuzzing we overload the runtime CPU feature detection mechanism (again with the LD_PRELOAD trick). We can then toggle between optimized and reference code by lying to liboqs about the CPU features that are available.

Compilation

cmake -GNinja \
      -DCMAKE_C_COMPILER=clang \
      -DBUILD_SHARED_LIBS=ON \
      -DOQS_USE_OPENSSL=OFF \
      -DOQS_DIST_BUILD=ON \
      -DCMAKE_BUILD_TYPE=Debug \
      -DUSE_SANITIZER=Fuzzer \
      ..; ninja

You can also use

  -DUSE_SANITIZER=FuzzerWithASan

to compile liboqs with address sanitizer.
Or

  -DUSE_SANITIZER=FuzzerWithUBSan

to compile liboqs with undefined behaviour sanitizer.

Usage

From the liboqs directory with a compiled library in ./build:

Make a directory to store coverage data in

mkdir ./coverage/

Pick a fuzz target

ls ./build/tests/fuzz*
export TARGET=fuzz_kem_frodo_640_aes

Create a directory in which to store the corpus of "inputs"

mkdir -p ./corpus/${TARGET}/

And run the fuzzer

./build/tests/${TARGET} ./corpus/${TARGET}/ -max_len=2000000 -len_control=0 -runs=1000

Realistically you want to loop over all targets:

for _TARGET in ./build/tests/fuzz*; do
  TARGET=$(basename $_TARGET)
  mkdir -p ./corpus/${TARGET}/
  export LLVM_PROFILE_FILE=./coverage/${TARGET}.profraw
  ./build/tests/${TARGET} ./corpus/${TARGET}/ -max_len=2000000 -len_control=0 -runs=1000
done

Note the large value of max_len. We want the fuzzer to provide as many bytes as we consume in random oracle answers.

See the libfuzzer guide for description of the other options.
https://llvm.org/docs/LibFuzzer.html#options

Coverage

You can generate a coverage report by running

llvm-profdata merge -sparse ./coverage/*.profraw -o fuzzer.profdata
llvm-cov report ./build/lib/liboqs.so -instr-profile=fuzzer.profdata

Remaining issues:

There are still a lot of duplicate random oracles (AES implementations) in Dilithium, Kyber-90s, Classic McEliece, and NTRU Prime.
We get some false-positives in Kyber due to a quirk of how the AVX2 implementation uses 4x-parallel shake.
I'm aware of a problem with excessive memory usage for Falcon when Address Sanitizer is enabled.

dstebila · 2021-05-03T15:47:04Z

dstebila
May 3, 2021
Maintainer

Looks very interesting John!

Does the coverage report indicate that there are large segments of code that we're not exercising?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open Quantum Safe

Tools for fuzzing liboqs #983

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Open Quantum Safe

Tools for fuzzing liboqs #983

jschanck Apr 30, 2021 Collaborator

"Inputs"?

How it works

Compilation

Usage

Coverage

Remaining issues:

Replies: 1 comment

dstebila May 3, 2021 Maintainer

jschanck
Apr 30, 2021
Collaborator

dstebila
May 3, 2021
Maintainer