-
Notifications
You must be signed in to change notification settings - Fork 4
BDIO Tool
The BDIO Tool is a set of tools for working with BDIO documents. It is organized as a command line tool with subcommands to perform different operations.
In general, the most up-to-date help is available by running bdio help
for general help or bdio <subcommand> --help
for help on an individual subcommand.
Most subcommands support a --pretty
option to control the rendering of output in a more friendly format.
All subcommands have support for four message levels: --quiet
, default, --verbose
and --debug
. The "debug" level is the same as verbose but will also include Java stack traces on error.
The following subcommands typically leverage an in-memory graph for processing BDIO data, these tools can only handle data relative to the amount of available memory the tool is given. Some of the subcommands allow you to configure an the TinkerPop provider giving you the flexibility to leverage a persistent graph that may not have the same heap requirements (such subcommands will be noted as "common graph tools" and inherit configuration options the bdio graph
tool).
The bdio graph
subcommand is used to load BDIO data into a TinkerPop graph database. This tool supports arbitrary TinkerPop providers and configurations and is the basis for all of the common graph tools. When the subcommand is executed it can optionally perform the following operations: wipe the existing database clean, pre-initialize schema/indexes, load BDIO files, normalize BDIO data and perform post analysis on the graph.
The bdio graph
subcommand, and all common graph tools, support the following command options to configure the graph provider:
- The
--graph=<factory>
option controls the TinkerPop graph factory (thegremlin.graph
configuration property), this can be a fully qualified Java class name or one of the predefined shortcuts (seebdio graph --help
usage for available shortcuts) - The
--config=<file>
option loads a Java properties file into the initial configuration for the graph - The
-D=<key>=<value>
option overrides an individual property for the initial graph configuration
For example, the bdio graph
subcommand can be configured to point to a PostgreSQL database using Sqlg using the following arguments (note that the username and password are required for the connection pool):
--graph sqlg -D jdbc.url=jdbc:postgresql://localhost:5432/bdio -D jdbc.username=bdio -D jdbc.password=bdio
Because the BDIO reference implementation will often reference the graph itself as a source of configuration data, additional customization of loading process can be controlled by setting bdio.*
properties: for example -D bdio.metadataLabel=FooBar
will make bdio graph
store JSON-LD metadata in a vertex labeled FooBar
. Available BDIO options can be found on the BlackDuckIoOptions
class.
The bdio viz
subcommand is used to visualize the contents of smaller BDIO documents using a modern web browser. When executed, this command will load the BDIO document into an in-memory graph and will listen for HTTP requests: the URL used to access the visualization is emitted to the console.
The visualizer is a common graph tool subcommand and can therefore be configured to use any existing graph instead of loading data from a BDIO document.
You should only attempt to use the visualizer on small data sets consisting of 5,000 or less vertices: larger data sets may cause your web browser to become unresponsive.
The bdio lint
subcommand is used to check for common mistakes when producing BDIO data. The linter will both scan a BDIO document and load it into an in-memory graph to apply a series of rules. Errors emitted by the linter correspond to deviations from the specification and will likely cause compatibility issues; warnings are generally emitted as a matter of style or best-practice.
The linter does not support common graph tool options for configuration and can only be used with the in-memory database, therefore it is not suitable for use with large data sets. Preferably, the linter should be used to validate BDIO production on smaller data during the development of the producer.
The bdio dependencies
and bdio tree
subcommands are used to traverse both the component and file hierarchies (respectively) found in a BDIO document. Both subcommands will load a BDIO document into an in-memory graph and perform BDIO normalization prior to traversal.
The following subcommands typically leverage a reactive stream over the data and do not have the same memory profile of graph based subcommands. Stream based subcommands can safely be used on larger inputs.
The bdio head
subcommand is used to extract metadata from BDIO documents. It can also be used to collect metadata from the current environment. Metadata from multiple inputs are merged together, the order of the inputs may impact the resulting metadata (for example, aggregated list data). The subcommand can produce output using a simple key/value format, as compacted JSON or as a full BDIO document.
In some cases, metadata extraction does not need to read the entire file; additionally, there is no in-memory database so the bdio head
subcommand is suitable for use on inputs of any size.
The bdio entries
subcommand is used to view individual BDIO "entries" within a document. Each BDIO document consists of one or more entries, each subject to a size constraint as prescribed by the BDIO specification. When rendering legacy documents, the data is "unconverted" and split into entries on the fly. For example, a large BDIO 1.x input will be split into multiple, smaller BDIO 2.x entries. Entries can be separated by new lines or a NUL byte; alternately a separate command can be forked to process each entry.
The bdio cat
subcommand is used to concatenate multiple BDIO documents into a single document.
CAUTION: Concatenation may result in invalid output, future versions of the subcommand may better detect these scenarios. In the current form, only expert users should leverage bdio cat
.
The bdio filter
subcommand is used to filter a BDIO document using one or more configurable criteria. Filters may be applied to the same input with different parameters to effectively split a BDIO document into multiple documents. For example, the "page filter" can be used to split a document based on offset/limit boundaries. The following filters are available:
- The
--page-filter=<start>[,<count>]
filter keeps only the nodes of a specific "page" identified using their absolute index within the document - The
--subdirectory-filter=<directory>
filter keeps files from the specified subdirectory, with the subdirectory becoming the new base directory; all non-file input is preserved
CAUTION: Filtering may result in invalid output, future versions of the subcommand may better detect these scenarios. In the current form, only expert users should leverage bdio filter
.
Other subcommands are available for performing specific BDIO related tasks.
The bdio hid
subcommand is useful for decomposing the "hierarchical identifier" value used to describe file paths. File paths are recorded as a URI, however when describing files that exist nested within an archive, the specification expects a very specific scheme that incorporates both the absolute URI of the archive and the archive entry name.
The BDIO Tool includes several subcommands that produce the documentation. This documentation generated by these subcommands is published at https://blackducksoftware.github.io/bdio/
The bdio context
subcommand generates the default JSON-LD context for BDIO. This is useful for using stock JSON-LD tools on BDIO data.
The bdio spec
subcommand prints out the BDIO specification as a Markdown document.
The bdio spdx
subcommand generates a BDIO document describing the licenses found in the SPDX specification.
The following illustrates some practical examples of how the BDIO Tool can be leveraged.
# Visualize a small BDIO data set stored in a Black Duck deployment
# (assumes the Postgres server allows password authentication to remote connections)
$ bdio viz --graph sqlg -D jdbc.url=jdbc:postgresql://localhost:55436/bdio -D jdbc.username=blackduck -D jdbc.password=password