Skip to content

A polyglot static analysis engine for detecting vulnerabilities in scripting languages native extensions based on joern.

License

Notifications You must be signed in to change notification settings

VainlyStrain/charon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CHARON

Setup | Usage | Configuration | Repo Architecture

CHARON

NPM Python C++

CHARON is a polyglot static analysis engine for detecting vulnerabilities in scripting languages native extensions. CHARON can detect taint-style vulnerabilities that cross several times the language boundary, spanning multiple functions on either side.

🏭 Setup

Dependencies: CHARON requires the following components to be installed on your system in order to function.

|-- python interpreter (>=3.10.0)
|-- joern (==1.1.1277)
|   |-- java development kit (e.g., OpenJDK)
|   |-- gcc(*)
|   |-- g++(*)
|-- pip(*)
|-- npm(*)
|-- tar(*)
|-- jq(*)
|-- curl(*)

Note: If you intend to analyze pre-downloaded code, only python and joern are required. Other dependencies are used for the download process.

Joern Installation

  1. Download the latest joern installer script.
  2. Run the installer: ./joern-install --interactive.
  3. When prompted for the version, enter 1.1.1277.
  4. Allow symlink creation if desired (requires elevated privileges).
  5. Set the JOERN_DIR in $CHARON_DIR/core/common/config.py to your joern installation path.

🚀 Usage

CHARON needs a list of packages to analyze, at least one operation to perform, the language system and a desired analysis type:

usage: python charon.py [-h] [-p PACKAGES] [-s] [-a] [-l {node,python}] [-c]
                 [-t {complete,verification}]

options:
  -h, --help            show this help message and exit
  -p PACKAGES, --packages PACKAGES
                        Package List (file)
  -s, --setup           Prepare Analysis
  -a, --analysis        Perform Analysis
  -l {node,python}, --language {node,python}
                        Scripting Language/Framework to analyze
  -c, --clean           Delete downloaded packages and generated output files
  -t {complete,verification}, --type {complete,verification} Analysis type

Package List

CLI argument: -p|--packages, expected value: file path

The list of packages to analyze should be passed as a file containing one package name per line.

Note: we provide the package lists we used in npm_ne.csv and pypi_ne.csv, for NPM and PyPI, respectively.

Operation

Setup

CLI argument: -s|--setup, expected value: none

Providing this flag enables the preparation phase for the provided packages, i.e., package downloading, dataset filtering, metadata extraction and CPG construction for both native and script code.

Analysis

CLI argument: -a|--analysis, expected value: none

Providing this flag enables the analysis phase for the provided packages. The analysis phase depends on outputs from the preparation phase. Therefore, before the first analysis on a set of packages p, it is mandatory to run the preparation phase on p first. However, the preparation phase does not need to be re-run on subsequent analyses, provided that both the package set and studied vulnerabilities remain identical.

It is possible to combine the preparation and analysis phases in a single run by passing both flags at the same time.

Language System

CLI argument: -l|--language, expected value: node|python

This argument specifies the language system used by the studied packages. It is used by the framework to load the correct plugin files.

Analysis Type

CLI argument: -t|--type, expected value: complete|verification

This argument influences how the identified flows are visualized in the final report. In a complete analysis, all cross-language dataflows are displayed in the report. A verification analysis displays a single cross-language flow per sink node. This may greatly reduce the size of the final report, but information about other attacker-controlled data sources is dropped.

Example

Running the full pipeline of CHARON on a set of Node.js native extensions node.csv, including preparation and complete analysis:

$ python charon.py -p node.csv -sa -l node -t complete &

⚙️ Configuration

CHARON supports a range of configuration options to tailor its analysis to your enviroment and system specs.

Configuration options are defined in $CHARON_DIR/core/common/config.py. Below, we list the most important options.

  • Performance and Resource Management
    • PROCESSES (default: 10): the amount of parallel instances when downloading and verifying the cross-language nature and presence of sinks in packages.
    • PROCESSES_PREP (default: 1): the amount of parallel instances during the graph construction phase. Depending on the size of the package codebase, CPG construction can be both CPU and memory costly. We recommend increasing this value incrementally and carefully, until the optimal value for your system is reached.
    • PROCESSES_ANALYSIS (default: 1): the amount of parallel instances during the PPG analysis phase. Depending on the size of the package codebase, PPGs may occupy larger sections of memory. We recommend increasing this value incrementally and carefully, until the optimal value for your system is reached.
    • TIMEOUT (default: 2500): a temporal limit for the processing of a package, in seconds. If an analysis or graph construction did not terminate before hitting the timeout, the current instance is killed and CHARON continues with the next package.
    • MEM_CAP (default: 16): sets the maximal memory size an individual JVM is allowed to allocate, in GB.
  • Logging Verbosity
  • Environment
    • JOERN_DIR: the path to your top-level joern installation directory. This value is used as fallback in case CHARON fails to find joern's executables, e.g., due to missing symlinks while joern is not included in $PATH.

🗂️ Repository Architecture

Folder: apps

This folder contains data related to analyzed packages. Package source code is downloaded into the code folder. For local applications, copy the code here. After graph construction, Code Property Graphs (CPGs) for native and script code are stored in cpg/addon and cpg/script.

Folder: core

Contains CHARON’s Python wrapper.

  • analysis: Manages analysis threads, starts the Joern engine, and collects results.
  • common: Shared functionalities, including file management, configuration, and data structures.
  • modules: Plugins for cross-language systems, managing language-specific configurations, result directories, and API lists for PPG linking.
  • preparation: Scripts for package download, filtering, metadata extraction, and graph construction.
  • fin.py: Finalizes analysis, moves results/logs to prevent overwriting.
  • init.py: Initializes the pipeline, validates vulnerability plugins, and registers them.

Folder: include

Contains source code for native extension APIs (e.g., NAN, N-API) added to the native CPG during construction.

Folder: scala

Contains CHARON’s PPG linking and cross-language analysis algorithm, executed by the Joern engine.

  • import: Joern query language extensions for linking and mitigation filtering.
  • main: Compiled CHARON versions for analysis, invoked by the Python wrapper.
  • templates: Core cross-language analysis and intra-language components.

Folder: vln

Contains plugins defining vulnerability-specific elements, with a template for adding new vulnerabilty types.

Folder: output

Contains CHARON’s log file and analysis results.

  • $LANG/callmap, $LANG/native, $LANG/script: PPG edges and dataflow lists for each package.
  • APIMAP.json: Metadata of native API presence for CPG and PPG linking.
  • CHARON.log: Logs analysis progress, vulnerabilities, mitigations, and potential failures.
  • SINKMAP.json: Sink presence information for query optimization.
  • VLNMAP.json: Cross-language dataflows detected, comprising the final analysis report.

📜 License

This project is available as open source under the terms of the GNU AFFERO GENERAL PUBLIC LICENSE V3.0. See LICENSE for more information.

About

A polyglot static analysis engine for detecting vulnerabilities in scripting languages native extensions based on joern.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published