Skip to content

BDIO Tool

Jeremy Gustie edited this page Feb 4, 2019 · 2 revisions

The BDIO Tool is a set of tools for working with BDIO documents. It is organized as a command line tool with subcommands to perform different operations.

In general, the most up-to-date help is available by running bdio help for general help or bdio <subcommand> --help for help on an individual subcommand.

Most subcommands support a --pretty option to control the rendering of output in a more friendly format.

All subcommands have support for four message levels: --quiet, default, --verbose and --debug. The "debug" level is the same as verbose but will also include Java stack traces on error.

Graph Based Subcommands

The following subcommands typically leverage an in-memory graph for processing BDIO data, these tools can only handle data relative to the amount of available memory the tool is given. Some of the subcommands allow you to configure an the TinkerPop provider giving you the flexibility to leverage a persistent graph that may not have the same heap requirements (such subcommands will be noted as "common graph tools" and inherit configuration options the bdio graph tool).

Graph

The bdio graph subcommand is used to load BDIO data into a TinkerPop graph database. This tool supports arbitrary TinkerPop providers and configurations and is the basis for all of the common graph tools. When the subcommand is executed it can optionally perform the following operations: wipe the existing database clean, pre-initialize schema/indexes, load BDIO files, normalize BDIO data and perform post analysis on the graph.

The bdio graph subcommand, and all common graph tools, support the following command options to configure the graph provider:

  • The --graph=<factory> option controls the TinkerPop graph factory (the gremlin.graph configuration property), this can be a fully qualified Java class name or one of the predefined shortcuts (see bdio graph --help usage for available shortcuts)
  • The --config=<file> option loads a Java properties file into the initial configuration for the graph
  • The -D=<key>=<value> option overrides an individual property for the initial graph configuration

For example, the bdio graph subcommand can be configured to point to a PostgreSQL database using Sqlg using the following arguments (note that the username and password are required for the connection pool):

--graph sqlg -D jdbc.url=jdbc:postgresql://localhost:5432/bdio -D jdbc.username=bdio -D jdbc.password=bdio

Because the BDIO reference implementation will often reference the graph itself as a source of configuration data, additional customization of loading process can be controlled by setting bdio.* properties: for example -D bdio.metadataLabel=FooBar will make bdio graph store JSON-LD metadata in a vertex labeled FooBar. Available BDIO options can be found on the BlackDuckIoOptions class.

Visualizer

The bdio viz subcommand is used to visualize the contents of smaller BDIO documents using a modern web browser. When executed, this command will load the BDIO document into an in-memory graph and will listen for HTTP requests: the URL used to access the visualization is emitted to the console.

The visualizer is a common graph tool subcommand and can therefore be configured to use any existing graph instead of loading data from a BDIO document.

You should only attempt to use the visualizer on small data sets consisting of 5,000 or less vertices: larger data sets may cause your web browser to become unresponsive.

Linter

The bdio lint subcommand is used to check for common mistakes when producing BDIO data. The linter will both scan a BDIO document and load it into an in-memory graph to apply a series of rules. Errors emitted by the linter correspond to deviations from the specification and will likely cause compatibility issues; warnings are generally emitted as a matter of style or best-practice.

The linter does not support common graph tool options for configuration and can only be used with the in-memory database, therefore it is not suitable for use with large data sets. Preferably, the linter should be used to validate BDIO production on smaller data during the development of the producer.

Dependencies and Tree

The bdio dependencies and bdio tree subcommands are used to traverse both the component and file hierarchies (respectively) found in a BDIO document. Both subcommands will load a BDIO document into an in-memory graph and perform BDIO normalization prior to traversal.

Stream Based Subcommands

The following subcommands typically leverage a reactive stream over the data and do not have the same memory profile of graph based subcommands. Stream based subcommands can safely be used on larger inputs.

Head

The bdio head subcommand is used to extract metadata from BDIO documents. It can also be used to collect metadata from the current environment. Metadata from multiple inputs are merged together, the order of the inputs may impact the resulting metadata (for example, aggregated list data). The subcommand can produce output using a simple key/value format, as compacted JSON or as a full BDIO document.

In some cases, metadata extraction does not need to read the entire file; additionally, there is no in-memory database so the bdio head subcommand is suitable for use on inputs of any size.

Entries

The bdio entries subcommand is used to view individual BDIO "entries" within a document. Each BDIO document consists of one or more entries, each subject to a size constraint as prescribed by the BDIO specification. When rendering legacy documents, the data is "unconverted" and split into entries on the fly. For example, a large BDIO 1.x input will be split into multiple, smaller BDIO 2.x entries. Entries can be separated by new lines or a NUL byte; alternately a separate command can be forked to process each entry.

Concatenate

The bdio cat subcommand is used to concatenate multiple BDIO documents into a single document.

CAUTION: Concatenation may result in invalid output, future versions of the subcommand may better detect these scenarios. In the current form, only expert users should leverage bdio cat.

Filter

The bdio filter subcommand is used to filter a BDIO document using one or more configurable criteria. Filters may be applied to the same input with different parameters to effectively split a BDIO document into multiple documents. For example, the "page filter" can be used to split a document based on offset/limit boundaries. The following filters are available:

  • The --page-filter=<start>[,<count>] filter keeps only the nodes of a specific "page" identified using their absolute index within the document
  • The --subdirectory-filter=<directory> filter keeps files from the specified subdirectory, with the subdirectory becoming the new base directory; all non-file input is preserved

CAUTION: Filtering may result in invalid output, future versions of the subcommand may better detect these scenarios. In the current form, only expert users should leverage bdio filter.

Miscellaneous Subcommands

Other subcommands are available for performing specific BDIO related tasks.

Hierarchical Identifier

The bdio hid subcommand is useful for decomposing the "hierarchical identifier" value used to describe file paths. File paths are recorded as a URI, however when describing files that exist nested within an archive, the specification expects a very specific scheme that incorporates both the absolute URI of the archive and the archive entry name.

Documentation Subcommands

The BDIO Tool includes several subcommands that produce the documentation. This documentation generated by these subcommands is published at https://blackducksoftware.github.io/bdio/

Context

The bdio context subcommand generates the default JSON-LD context for BDIO. This is useful for using stock JSON-LD tools on BDIO data.

Specification

The bdio spec subcommand prints out the BDIO specification as a Markdown document.

SPDX

The bdio spdx subcommand generates a BDIO document describing the licenses found in the SPDX specification.

Examples

The following illustrates some practical examples of how the BDIO Tool can be leveraged.

# Visualize a small BDIO data set stored in a Black Duck deployment
# (assumes the Postgres server allows password authentication to remote connections)
$ bdio viz --graph sqlg -D jdbc.url=jdbc:postgresql://localhost:55436/bdio -D jdbc.username=blackduck -D jdbc.password=password