P6 is a research project for developing a declarative language to specify visual analytics processes that integrate machine learning methods with interactive visualization for data analysis and exploration. P6 uses P4 for GPU accelerated data processing and rendering, and leverages Scikit-Learn and other Python libraries for supporting machine learning algorithms.
Demos for using declarative specifications with clustering, dimension reduction, and regression here:
- K-Means Clustering and PCA
- RandomForest Regressor
- Hierarchical Clustering and Multiple Views
- Brushing and Linking with Dimension Reductions
To run P6, first install both the JavaScript and Python dependencies and libraries:
npm install
pip install -r python/requirements.txt
For development and trying the example applications, use the following commands for starting the server and client
npm start
Or start server and client on two different terminals/consoles:
npm run server
npm run client
The example applications can be accessed at http://localhost:8080/examples/
//config
let app = p6()
.data({url: 'data/babies.csv'}) // input data
.analyze({
// analyze the data using sklearn.decomposition.PCA and store the result in a new variable 'PC'
PC: {
module: 'decomposition',
algorithm: 'PCA',
n_components: 2,
features: ['BabyWeight', 'MotherWeight', 'MotherHeight', 'MotherWgtGain', 'MotherAge']
}
})
app.layout({
container: "app", // id of the div
viewport: [800, 400]
})
.visualize({
chart: {
mark: 'circle', size: 8,
x: 'PC1', y: 'PC0',
color: 'clusters', opacity: 0.5,
}
})
P6 provides a JavaScript API with a declarative language for specifying operations in visual analytics processes, which include data processing, machine learning, visualization, interaction.
data({source, selection, preprocess, transform})
- source: source of the dataset, example: {url: './data/babies.csv}
- select: select data subset by rows, columns, or data types. Example: {select: {nrows: 10000, columns: ['BabyWeight', 'BabyGender']}}
- nrows - number of rows
- columns - specify which data columns
- dtype - select
categorical
ornumerical
data
- preprocess: preprocess data by dtypes.
- Example for using one-hot encoding on categorical data: {preprocess: {categorical: 'OneHot'}}
- Example for dropping null values: {preprocess: {null: 'drop'}}
- Example for filling null values by columns: {preprocess: {null: {fill: {BabyWeight: 8}}}
analyze({algorithm, features, scaling, [parameters]})
- algorithm: supported algorithms and methods - clustering, dimension reduction, manifold
- features: data fields as the input to the specified
algorithm
. - scaling: use
StandardScaler
,LabelEncoder
minmax_scale
, or other preprocessors for scaling the input data - [parameters]: use the same name as the functions in Python libraries. As shown in the example shown above,
n_component
is directly passed tosklearn.decomposition.PCA
. More parameters can be set in this way.
model({module, method, trainingData, features, target, [parameters]})
- module: Python library and module containing the
method
for fitting the model. Example:sklearn.linearmodel
. - method: the function to be called for fitting the model. Example:
LinearRegression
. - trainingData: data for training the model
- features: input features to the model
- target: the data field for prediction
- [parameters]: hyperparameters for the model
To organize the views for visualization, the layout
function can be used for configuring the views and layouts.
layout({id, width, height, padding, [options]})
To visualize data or analysis result, call `visualize' to transform data (optional), choose a visual mark, and specify the visual encoding for mapping data to visual marks.
visualize({transform, visualMark, [encoding]})
Jianping Kelvin Li and Kwan-Liu Ma. P6: A Declarative Language for Integrating Machine Learning in Visual Analytics. IEEE Transactions on Visualization and Computer Graphics (Proc: VAST), 2020
This research was sponsored in part by the U.S. National Science Foundation through grant NSF IIS-1528203 and U.S. Department of Energy through grant DE-SC0014917.