Setup | Usage | Configuration | Repo Architecture
CHARON is a polyglot static analysis engine for detecting vulnerabilities in scripting languages native extensions. CHARON can detect taint-style vulnerabilities that cross several times the language boundary, spanning multiple functions on either side.
Dependencies: CHARON requires the following components to be installed on your system in order to function.
|-- python interpreter (>=3.10.0) |-- joern (==1.1.1277) | |-- java development kit (e.g., OpenJDK) | |-- gcc(*) | |-- g++(*) |-- pip(*) |-- npm(*) |-- tar(*) |-- jq(*) |-- curl(*)
Note: If you intend to analyze pre-downloaded code, only python and joern are required. Other dependencies are used for the download process.
- Download the latest joern installer script.
- Run the installer:
./joern-install --interactive
. - When prompted for the version, enter
1.1.1277
. - Allow symlink creation if desired (requires elevated privileges).
- Set the
JOERN_DIR
in$CHARON_DIR/core/common/config.py
to your joern installation path.
CHARON needs a list of packages to analyze, at least one operation to perform, the language system and a desired analysis type:
usage: python charon.py [-h] [-p PACKAGES] [-s] [-a] [-l {node,python}] [-c]
[-t {complete,verification}]
options:
-h, --help show this help message and exit
-p PACKAGES, --packages PACKAGES
Package List (file)
-s, --setup Prepare Analysis
-a, --analysis Perform Analysis
-l {node,python}, --language {node,python}
Scripting Language/Framework to analyze
-c, --clean Delete downloaded packages and generated output files
-t {complete,verification}, --type {complete,verification} Analysis type
CLI argument:
-p
|--packages
, expected value: file path
The list of packages to analyze should be passed as a file containing one package name per line.
Note: we provide the package lists we used in npm_ne.csv
and pypi_ne.csv
, for NPM and PyPI, respectively.
Setup
CLI argument:
-s
|--setup
, expected value: none
Providing this flag enables the preparation phase for the provided packages, i.e., package downloading, dataset filtering, metadata extraction and CPG construction for both native and script code.
Analysis
CLI argument:
-a
|--analysis
, expected value: none
Providing this flag enables the analysis phase for the provided packages. The analysis phase depends on outputs from the preparation phase. Therefore, before the first analysis on a set of packages p, it is mandatory to run the preparation phase on p first. However, the preparation phase does not need to be re-run on subsequent analyses, provided that both the package set and studied vulnerabilities remain identical.
It is possible to combine the preparation and analysis phases in a single run by passing both flags at the same time.
CLI argument:
-l
|--language
, expected value:node
|python
This argument specifies the language system used by the studied packages. It is used by the framework to load the correct plugin files.
CLI argument:
-t
|--type
, expected value:complete
|verification
This argument influences how the identified flows are visualized in the final report. In a complete
analysis, all cross-language dataflows are displayed in the report. A verification
analysis displays a single cross-language flow per sink node. This may greatly reduce the size of the final report, but information about other attacker-controlled data sources is dropped.
Running the full pipeline of CHARON on a set of Node.js native extensions node.csv
, including preparation and complete analysis:
$ python charon.py -p node.csv -sa -l node -t complete &
CHARON supports a range of configuration options to tailor its analysis to your enviroment and system specs.
Configuration options are defined in $CHARON_DIR/core/common/config.py
. Below, we list the most important options.
- Performance and Resource Management
PROCESSES
(default: 10): the amount of parallel instances when downloading and verifying the cross-language nature and presence of sinks in packages.PROCESSES_PREP
(default: 1): the amount of parallel instances during the graph construction phase. Depending on the size of the package codebase, CPG construction can be both CPU and memory costly. We recommend increasing this value incrementally and carefully, until the optimal value for your system is reached.PROCESSES_ANALYSIS
(default: 1): the amount of parallel instances during the PPG analysis phase. Depending on the size of the package codebase, PPGs may occupy larger sections of memory. We recommend increasing this value incrementally and carefully, until the optimal value for your system is reached.TIMEOUT
(default: 2500): a temporal limit for the processing of a package, in seconds. If an analysis or graph construction did not terminate before hitting the timeout, the current instance is killed and CHARON continues with the next package.MEM_CAP
(default: 16): sets the maximal memory size an individual JVM is allowed to allocate, in GB.
- Logging Verbosity
LEVEL
(default: logging.DEBUG): increase or decrease the logging verbosity. The available levels are documented in Python's logging documentation.
- Environment
JOERN_DIR
: the path to your top-level joern installation directory. This value is used as fallback in case CHARON fails to find joern's executables, e.g., due to missing symlinks while joern is not included in$PATH
.
This folder contains data related to analyzed packages. Package source code is downloaded into the code
folder. For local applications, copy the code here. After graph construction, Code Property Graphs (CPGs) for native and script code are stored in cpg/addon
and cpg/script
.
Contains CHARON’s Python wrapper.
analysis
: Manages analysis threads, starts the Joern engine, and collects results.common
: Shared functionalities, including file management, configuration, and data structures.modules
: Plugins for cross-language systems, managing language-specific configurations, result directories, and API lists for PPG linking.preparation
: Scripts for package download, filtering, metadata extraction, and graph construction.fin.py
: Finalizes analysis, moves results/logs to prevent overwriting.init.py
: Initializes the pipeline, validates vulnerability plugins, and registers them.
Contains source code for native extension APIs (e.g., NAN
, N-API
) added to the native CPG during construction.
Contains CHARON’s PPG linking and cross-language analysis algorithm, executed by the Joern engine.
import
: Joern query language extensions for linking and mitigation filtering.main
: Compiled CHARON versions for analysis, invoked by the Python wrapper.templates
: Core cross-language analysis and intra-language components.
Contains plugins defining vulnerability-specific elements, with a template for adding new vulnerabilty types.
Contains CHARON’s log file and analysis results.
$LANG/callmap
,$LANG/native
,$LANG/script
: PPG edges and dataflow lists for each package.APIMAP.json
: Metadata of native API presence for CPG and PPG linking.CHARON.log
: Logs analysis progress, vulnerabilities, mitigations, and potential failures.SINKMAP.json
: Sink presence information for query optimization.VLNMAP.json
: Cross-language dataflows detected, comprising the final analysis report.
This project is available as open source under the terms of the GNU AFFERO GENERAL PUBLIC LICENSE V3.0
. See LICENSE for more information.