Data compression over the wire

Right now data is transferred from Python to JS fully uncompressed:
https://github.com/developmentseed/lonboard/blob/6a64c6fc30343032f5032535a37505ce2dbd803d/lonboard/layer.py#L68

Uncompressed data is fine for _local_ kernels, where Python and the browser are on the same machine, but not ideal for remote kernels, like JupyterHub or Colab, where Python is on a remote server and data has to be downloaded before it can be rendered on a map.

## Data Compression options

There are a few options for data compression:

- Uncompressed
- Apply a simple compression like gzip to the entire table buffer. This is simple to implement on both the Python and JS sides, but is _quite slow_
- Apply compression in the [Arrow IPC format](https://arrow.apache.org/docs/python/ipc.html). This file format supports only "light compression" (LZ4 or ZSTD) and doesn't do any other encoding like delta encoding for smaller file size. The downside is that reading compressed IPC files is [not currently supported by Arrow JS](https://github.com/apache/arrow/pull/13076).
- Use Parquet. This has the most efficient compression, but it has the downsides of requiring a [WebAssembly-based parser](https://github.com/kylebarron/parquet-wasm/) on the JS side. Adding the Wasm could make the build setup more difficult.

## Different settings for local/remote?

Another question is whether it's possible to have different compression defaults based on whether the Python session is local or remote. Ideally a local Python kernel could use no compression while a remote Python kernel could use the most efficient compression.

The problem is that because Python-Jupyter follows a server-client model, I don't know of a good way to know from Python whether the attached client is running locally or remotely. There could be some heuristics like checking if `google.colab in sys.modules` but that's only valid in the colab case.

So it seems like the best _default_ would be fast, moderate-size compression, and then have a parameter to let the user choose either no compression or slow, small-file-size compression.

## Unscientific benchmarks

Unscientific benchmarks using the utah dataset of 1 million buildings (7M coords):

| Compression Type             | File size | Write time |
| ---------------------------- | --------- | ---------- |
| Feather (uncompressed)       | 144 MB    | 17 ms      |
| gzip full-buffer compression | 64 MB     | 13 s       |
| Feather (ZSTD)               | 80 MB     | 200 ms     |
| Feather (LZ4)                | 97 MB     | 147 ms     |
| Parquet (Snappy)             | 82 MB     | 444 ms     |
| Parquet (gzip)               | 60 MB     | 4.5 s      |
| Parquet (brotli)             | 45 MB     | 3.7 s      |
| Parquet (ZSTD)               | 74 MB     | 466 ms     |
| Parquet (ZSTD level 22)      | 41.6 MB   | 11 s       |
| Parquet (ZSTD level 18)      | 41.6 MB   | 9.8 s      |
| Parquet (ZSTD level 16)      | 48.3 MB   | 5.7 s      |
| Parquet (ZSTD level 14)      | 49.8 MB   | 2.7 s      |
| Parquet (ZSTD level 12)      | 49.8 MB   | 1.9 s      |
| Parquet (ZSTD level 10)      | 49.8 MB   | 1.7 s      |
| Parquet (ZSTD level 8)      | 50.3 MB   | 1.4 s      |
| Parquet (ZSTD level 7)      | 50.3 MB   | 1.25 s      |
| Parquet (ZSTD level 6)      | 51.4 MB   | 1.2 s      |
| Parquet (ZSTD level 4)      | 57.8 MB   | 800 ms      |
| Parquet (ZSTD level 2)      | 69.1 MB   | 560 ms      |

Given this, ZSTD around level ~7 seems to have a very good combination of write speed and file size, and likely makes sense as a default.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data compression over the wire #37

Data Compression options

Different settings for local/remote?

Unscientific benchmarks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compression Type	File size	Write time
Feather (uncompressed)	144 MB	17 ms
gzip full-buffer compression	64 MB	13 s
Feather (ZSTD)	80 MB	200 ms
Feather (LZ4)	97 MB	147 ms
Parquet (Snappy)	82 MB	444 ms
Parquet (gzip)	60 MB	4.5 s
Parquet (brotli)	45 MB	3.7 s
Parquet (ZSTD)	74 MB	466 ms
Parquet (ZSTD level 22)	41.6 MB	11 s
Parquet (ZSTD level 18)	41.6 MB	9.8 s
Parquet (ZSTD level 16)	48.3 MB	5.7 s
Parquet (ZSTD level 14)	49.8 MB	2.7 s
Parquet (ZSTD level 12)	49.8 MB	1.9 s
Parquet (ZSTD level 10)	49.8 MB	1.7 s
Parquet (ZSTD level 8)	50.3 MB	1.4 s
Parquet (ZSTD level 7)	50.3 MB	1.25 s
Parquet (ZSTD level 6)	51.4 MB	1.2 s
Parquet (ZSTD level 4)	57.8 MB	800 ms
Parquet (ZSTD level 2)	69.1 MB	560 ms

Data compression over the wire #37

Description

Data Compression options

Different settings for local/remote?

Unscientific benchmarks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions