Skip to content

ciheng916/ConfTainter_reset

Repository files navigation

ConfTainter

ConfTainter is a Static Taint Analysis Infrastructure for configuration options. It is based on LLVM IR, and analyzes the control and data dependency starting from the specified configuration variable(s)

  • ConfTainter: Static Taint Analysis For Configuration Options
    Teng Wang, Haochen He, Xiaodong Liu, Shanshan Li, Zhouyang Jia, Yu Jiang, Qing Liao, Wang Li. "ConfTainter: Static Taint Analysis For Configuration Options", In Proceedings of the 38th ACM/IEEE International Conference on Automated Software Engineering (ASE 2023), 11-15 September, 2023, Luxembourg.

Data flow

  • Intra-procedural analysis ( basic LLVM "Use" support )
    截屏2022-09-03 15 27 06
  • Field sensitive analysis
    截屏2022-09-03 15 44 43
  • Inter-procedure (with pointer)
    截屏2022-09-03 15 53 39
    截屏2022-09-03 16 02 01
  • Implicit data-flow (phi-node)
    截屏2022-09-03 16 06 28
    • How to formaly determine if a phi-node will be tainted
      Given a phiNode like:
         phi i32 [ %5, %bb1.i ], [ 0, %bb1 ]
                    pre_node      pre_node2
      
      we check if: 截屏2022-09-03 16 07 53

Control flow

Formaly define how the control flow:

  • Control Dependency on Configuration: A block Y is control-dependent on a configuration option C if and only if (a) the branching instruction of block X is tainted by C; (b) Y is control-dependent on X.
    • Control Dependent: A block Y is control dependent on block X if and only if: Y post-dominates at least one but not all successors of X.

An example, where the yellow square indicats the complicated code structures that motivate the use of the formal definition.
截屏2022-09-03 16 39 03

  • Implicit Control flow: Except for explicit control-flow propagation, configuration options also implicitly propagate control dependency and dominate program blocks, using delay statements (e.g., sleep function). If a delay function occurs in a loop, and are tainted or dominated by the target option, other basic blocks in the loop can be considered as implicit control-flow dependencies.

Bug Fixes

This fork includes important bug fixes for the original ConfTainter implementation:

Fixed Issues

  1. Memory Leaks: Added proper deletion of DominatorTree and PostDominatorTree objects in tainter.cpp
  2. StringRef Conversion: Fixed LLVM 10 StringRef to std::string conversion issues in tainter.h
  3. Object Lifecycle: Fixed critical object destruction order bug in TestMain.cpp (LLVMContext must outlive Module)
  4. File Output: Restored and fixed writeToFile() functionality for generating analysis results

These fixes resolve segmentation faults and enable the tool to run successfully.


Usage

Option 1: Using Docker (Recommended)

The easiest way to use ConfTainter is with Docker, which provides a pre-configured LLVM 10 environment.

Prerequisites

Quick Start

# Build Docker image (first time only)
docker build -t conftainter:latest .

# Run analysis in Docker container
docker run --rm -v "$(pwd):/workspace" conftainter:latest bash -c "
  cd /workspace/src/test/demo && 
  clang++-10 -O0 -g -fno-discard-value-names -emit-llvm -c test.cpp -o test.bc &&
  ../../tainter test.bc test-var.txt &&
  cat test-var-records.dat
"

Interactive Development

# Enter container for interactive development
docker run -it --rm -v "$(pwd):/workspace" conftainter:latest /bin/bash

# Inside container:
cd /workspace/src/test/demo
clang++-10 -O0 -g -fno-discard-value-names -emit-llvm -c test.cpp -o test.bc
../../tainter test.bc test-var.txt
cat test-var-records.dat

Option 2: Local Installation

Dependencies

  • llvm-10.0.0
  • CMake >= 3.5
  • Boost >= 1.73.0 (optional, for serialization)
  • wllvm or gllvm (for whole-program analysis)

Build

cd src
cmake -DCMAKE_CXX_COMPILER=/usr/bin/clang++-10 \
      -DCMAKE_C_COMPILER=/usr/bin/clang-10 \
      -DLLVM_DIR=/usr/lib/llvm-10/cmake .
make

Run

# Compile your code to LLVM bitcode
clang++-10 -O0 -g -fno-discard-value-names -emit-llvm -c test.cpp -o test.bc

# Run analysis
cd src/test/demo
../../tainter test.bc test-var.txt

# Check results
cat test-var-records.dat

For real systems, use wllvm to obtain the .bc file (e.g., mysqld.bc)

Specify the entry configuration variable

  • SINGLE CONF_VAR_NAME global variable with basic type (int, bool, etc.)
  • STRUCT CONF_VAR_STRUCT.FIELD_NAME global struct with field
  • CLASS CONF_VAR_CLASS.FIELD_NAME global class with field
  • FIELD CONF_VAR_TPYE.FIELD_COUNT any field of specified type, for example, use FIELD some_type.2 to make some_type.field_C as the entry point.
    STRUCT some_type{
       int field_A;
       bool field_B;
       float field_C;
    }
    

How to debug:

  1. Make sure you have use the right compilation options: -O0-fno-discard-value-names-g; if you want the PhiNode analysis, also use these two options.
  2. Make sure the specified configuration variable name is right.
    • Check if it exists in source code via simple search grep CONF_NAME /dir/of/src.
    • Check if it has been compiled into the target .bc file grep CONF_VAR_NAME /dir/to/target.ll.

How to scale ConfTainter

Examples

An Demo of applying ConfTainter is shown in /src/TestMain.cpp

Users can use the following codes to conduct taint analysis.

  string ir_file = string(argv[1]); //input the ir path
  string var_file = string(argv[2]);  //input the variable mapping path
  std::vector<struct ConfigVariableNameInfo *> config_names;
  if (!readConfigVariableNames(var_file, config_names)) exit(1);
  
  std::unique_ptr<llvm::Module> module;
  LLVMContext context; SMDiagnostic Err;
  buildModule(module, context, Err, ir_file); //build the llvm module based on ir
  std::vector<struct GlobalVariableInfo *> gvlist = getGlobalVariableInfo(module, config_names);  //conduct Configuration Variable Mapping
  startAnalysis(gvlist, true, true);    //Taint Analysis with implicit data-flow, and implicit control-flow

Application Programming Interface (API)

The details of APIs can be found in /src/tainter.h

For example,

bool readConfigVariableNames( std::string, std::vector< struct ConfigVariableNameInfo* >&);

int buildModule(std::unique_ptr<llvm::Module> &module, LLVMContext &context, SMDiagnostic &Err, string ir_file);

int startAnalysis(std::vector<struct GlobalVariableInfo *> gv_info_list, bool isAnalysisImplicitData, bool isAnalysisImplicitControl);

GlobalVariableInfo::vector<struct InstInfo *> getExplicitDataFlow();

Apply/Scale ConfTainter to configuration-related tasks

We conduct three prototype experiments by applying ConfTainter to

  • Misconfiguration Detection (TInfer in /ApplicabilityExperiment_5_3_1)
  • Configuration-related Bug Detection (TCub in /ApplicabilityExperiment_5_3_2) (TFuzz in /ApplicabilityExperiment_5_3_3)

About

the reset of ConfTainter

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published