Skip to content

Precompute Data Tiles

Zhicheng Liu edited this page Oct 22, 2013 · 19 revisions

Data tiles are multi-variate aggregation of data tables. For example, if you have a hypothetical table inside your PostgreSQL database that records information about tweets:

Table tweets (
user_id       character varying(20)
language      character varying(2)
lat           double precision
lon           double precision
date_time     timestamp
retweets      integer
favorites     integer
)

To compute data tiles, we need to first define bins for each of the dimensions involved. imMens supports binning over five types of dimensions: numeric, nominal, lat, lon and temporal. Bins over numeric dimensions are defined by a choice of bin width. For example, the values of "retweets" may range from 0 to 1000 with a pre-defined bin width of 100, resulting in 10 bins. The current version of imMens supports uniform-width bins only. Multiple levels of resolution can be defined by numeric bins. For example, a bin width of 100 for "retweets" may be zoom level 0, and at the finer granularity, we can define a smaller bin width of 50 at zoom level 1.

For nominal dimensions such as "user_id" and "language", each unique value is a bin by default. For geographic latitude and longitude dimensions (lat and lon coordinates), two parameters are needed: the width of a bin expressed in the number of pixels in the projected screen space, and the zoom level. Finally, we can define bins at multiple levels of abstraction for temporal dimensions: year, month, day, hour, minute, second, etc.

The bin definitions are expressed using the JSON format:

lat/lon/numeric: {type, binWidth, level}
nominal: {type}
temporal: {type, granularity)

A data tile class is a specification of bins for multiple dimensions at particular zoom levels. For the dimensions lat, lon, user_id and retweets, for example, this is an exemplary class specification:

{
"table": "tweets",
"dimensions": {
              "lat": {"type": "lat", "binWidth": 2, "level": 4},
              "lon": {"type": "lon", "binWidth": 2, "level": 4},
              "user_id": {"type": "nominal"},
              "retweets": {"type": "numeric", "binWidth": 10, "level": 3},
              }
}

Below is another data tile class specification and a sample data tile in a sparse table representation. Note that the current version of imMens supports only one type of aggregation: row count.

specification:
{
"table": "tweets",
"dimensions": {
              "user_id": {"type": "nominal"},
              "language": {"type": "nominal"},
              "date_time": {"type": "temporal", "granularity": "month"}
              }
}

sample data tile:
user_id    language    date_time     count
-------------------------------------------
u3355      EN          May 2008      25
u90        ZH          Jan 2010      31
u88        EN          Dec 2009      9
... 

Assuming you have defined the specifications of data tile classes in a file (if you have multiple class specifications, put them in an array), run the following command to launch the tile generator with five arguments: the path to the specification file, the path of the output directory to store the generated tiles, postgres connection URL (which includes a host, a port and a database name, e.g. "//localhost:5432/testdb"), postgres user name and password.

java -jar PostgresTileGenerator.jar <specification_file> <output_directory> <postgres_url> <postgres_user> <postgres_password>
Clone this wiki locally