Skip to content

Set up an automated valgrind test #69

Open
@adamcrume

Description

@adamcrume
Contributor

Not usually necessary for Rust, but since we're doing a lot of unsafe stuff with pointers, we should have a valgrind test. Ideally, Travis would run it and fail the build if there are issues.

Activity

nbigaouette-eai

nbigaouette-eai commented on Mar 14, 2017

@nbigaouette-eai
Contributor

It is always a good idea to run code/tests through valgrind. One could simply run the current tests (without docs) through valgrind, and add more that use yet uncovered code path.

Something interesting too would be to add code coverage to make sure as much as possible is covered.

And then even fuzzing; I've heard good things of cargo fuzz.

nbigaouette-eai

nbigaouette-eai commented on Mar 14, 2017

@nbigaouette-eai
Contributor

Note that jemalloc removed valgrind support (see jemalloc/jemalloc@9a8add1) but this does not seems to be included in rust's version of a jemalloc. Rust uses jemalloc @ 33184bf69813087bf1885b0993685f9d03320c69 plus a couple of patches, which is jemalloc 4.1.0 plus four patches.

Enet4

Enet4 commented on Mar 14, 2017

@Enet4
Contributor

Well, I read this recent blog post on using Valgrind for this very purpose, and we are still advised to change the memory allocation system (using the nightly alloc_system feature). Other than that, this is a useful resource to whoever is attempting to resolve this issue.

nbigaouette-eai

nbigaouette-eai commented on Mar 15, 2017

@nbigaouette-eai
Contributor

Great read, thanks.

Maybe an optional combination alloc_system+nightly+valgrind could be added. As @adamcrume mentioned in #66, it might not be a good idea to depend on alloc_system and nightly for the crate, but an option to increase the testing is definitely a good idea.

adamcrume

adamcrume commented on Mar 15, 2017

@adamcrume
ContributorAuthor

We might copy the suggestion at rust-lang/rust#28224 (comment), which is to add a nightly feature and add this to the code:

#![cfg_attr(feature="nightly", feature(alloc_system))]
#[cfg(feature="nightly")]
extern crate alloc_system;

Then we can build on the nightly channel with --features=nightly and run valgrind against that.

nbigaouette-eai

nbigaouette-eai commented on Mar 20, 2017

@nbigaouette-eai
Contributor

So I've experimented running the tests through valgrind, but it's not as trivial as it looks.

First, as mentioned above, using jemalloc (the default memory allocator in rust) will confuse valgrind. So one has to build the tests using the system allocator, which is only available when using a nightly build of rust. This means Travis would need to be setup to run nightly, possibly in addition to stable rust.

Second (and most importantly), we don't want to run cargo through valgrind, but the tests binaries. This means the command used to run the tests should not be something like valgrind cargo test because valgrind will catch cargo errors, not the bindings' ones. Suppressions could be written but this is a tedious task.

"Normally" people would run valgrind directly on the test binaries, for example valgrind tensorflow_rust.git/target/debug/deps/lib-aab328fe9f627d08. This has two problems:

  1. How to programatically extract the filename? There is a couple of these binaries in the target directory. We'd need to find them and execute them. This complexifies the .travis.yml file.
  2. The binaries are linked dynamically to libtensorflow.so and so must find them at run time. Simply running the binaries will fail with error while loading shared libraries: libtensorflow.so: cannot open shared object file: No such file or directory if LD_LIBRARY_PATH is not set properly.

Cargo will set a working environment for the binaries so it can run them. It also knows exactly what to run. So it's sad that can't directly run cargo valgrind test or something like that.

Instead of using valgrind, we could try out the different sanitizers as provided by llvm. See here for more information: https://github.com/japaric/rust-san
With a nightly rust newer than a month old, the sanitizers are included. Using them is quite easy and should work I think in Travis. On my machine for example:

$ cd tensorflow-sys
$ RUSTFLAGS="-Z sanitizer=leak" cargo +nightly test -vv -j 1 --target x86_64-unknown-linux-gnu -- --nocapture
[...]
=================================================================
==28527==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x55620849f0ea  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x400ea)
    #1 0x55620846d0ac  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe0ac)
    #2 0x55620846d020  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe020)
    #3 0x55620846d1bd  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe1bd)
    #4 0x55620847b5be  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x1c5be)
    #5 0x556208470040  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x11040)
    #6 0x5562084d8aaa  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x79aaa)
    #7 0x55620846f680  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x10680)
    #8 0x5562084d8aaa  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x79aaa)
    #9 0x5562084762f6  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x172f6)
    #10 0x5562084d0924  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x71924)
    #11 0x7ffbbbee02e6  (/usr/lib/libpthread.so.0+0x72e6)

SUMMARY: LeakSanitizer: 4 byte(s) leaked in 1 allocation(s).
error: test failed

when adding a simple std::mem::forget(Box::new(42)); to a test to force a memory leak.

This is not perfect; the implementation is not bug free nor finished. I'm also not seeing where the allocation was performed (as shown in the link example) but at least it's catching it.

What I think should be done is to first extend the Travis configuration to allow stable/beta/nightly rust. This would allow adding more testing pieces, like sanitizers or valgrind. I'll open a PR about this.

adamcrume

adamcrume commented on Mar 21, 2017

@adamcrume
ContributorAuthor

I'd like to see how a compiler-based sanitizer works when interacting with code it didn't compile, i.e. what happens when we allocate something through libtensorflow.so and then don't release that.

I assume that the offsets in that stack trace could be traced back to a line of code via nm voodoo. Maybe they'll build that into the sanitizer at some point.

added 2 commits that reference this issue on Jun 17, 2017

4 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @adamcrume@Enet4@nbigaouette-eai

      Issue actions

        Set up an automated valgrind test · Issue #69 · tensorflow/rust