-
Notifications
You must be signed in to change notification settings - Fork 33
Define Binning Schemes
In order to visualize data, imMens needs your instructions on how to aggregate raw data points. Let's assume that you have a hypothetical table inside your PostgreSQL database that records information about tweets:
Table tweets (
user_id character varying(20)
language character varying(2)
lat double precision
lon double precision
date_time timestamp
retweets integer
favorites integer
)
We need to first define bins for each of the dimensions involved. imMens supports binning over five types of dimensions: numeric, nominal, lat, lon and temporal. Bins over numeric dimensions are defined by a choice of bin width. For example, the values of "retweets" may range from 0 to 1000 with a pre-defined bin width of 100, resulting in 10 bins. The current version of imMens supports uniform-width bins only. Multiple levels of resolution can be defined. For example, a bin width of 100 for "retweets" may be zoom level 0, and at the finer granularity, we can define a smaller bin width of 50 at zoom level 1.
For nominal dimensions such as "user_id" and "language", each unique value is a bin by default, and you do not need to specify bins. For geographic latitude and longitude dimensions (lat and lon coordinates), two parameters are needed: the width of a bin expressed in the number of pixels in the projected screen space, and the zoom level. Finally, we can define bins at multiple levels of abstraction for temporal dimensions: year, month, day, hour, minute, second, etc.
The bin definitions are expressed using the JSON format:
{
dimension_name: {"type": _type_,
"levels":{
level0: {parameter: },
level1: {parameter: },
...
}
}
}
lat/lon/numeric: {type, binWidth, level}
nominal: {type}
temporal: {type, granularity)
A data tile class is a specification of bins for multiple dimensions at particular zoom levels. For the dimensions lat, lon, user_id and retweets, for example, this is an exemplary class specification:
{
"table": "tweets",
"dimensions": {
"lat": {"type": "lat", "binWidth": 2, "level": 4},
"lon": {"type": "lon", "binWidth": 2, "level": 4},
"user_id": {"type": "nominal"},
"retweets": {"type": "numeric", "binWidth": 10, "level": 3},
}
}
Below is another data tile class specification and a sample data tile in a sparse table representation. Note that the current version of imMens supports only one type of aggregation: row count.
specification:
{
"table": "tweets",
"dimensions": {
"user_id": {"type": "nominal"},
"language": {"type": "nominal"},
"date_time": {"type": "temporal", "granularity": "month"}
}
}
sample data tile:
user_id language date_time count
-------------------------------------------
u3355 EN May 2008 25
u90 ZH Jan 2010 31
u88 EN Dec 2009 9
...
Assuming you have defined the specifications of data tile classes in a file (if you have multiple class specifications, put them in an array), run the following command to launch the tile generator with five arguments: the path to the specification file, the path of the output directory to store the generated tiles, postgres connection URL (which includes a host, a port and a database name, e.g. "//localhost:5432/testdb"), postgres user name and password.
java -jar PostgresTileGenerator.jar <specification_file> <output_directory> <postgres_url> <postgres_user> <postgres_password>