API reference

Everything below is importable from the top level (from tessera_eval import …) unless noted. Array shapes use N = number of pixels/samples, dim = 128.

`tessera_eval.data`

`dequantize_uint8(quantized, dim_min, dim_max) -> float32`

Per-dim dequantization (TEE format). quantized is uint8 (N, 128) or (H, W, 128); dim_min/dim_max are (128,). Returns same shape as input.

`dequantize_int8(quantized, scales) -> float32 (H, W, 128)`

Per-pixel dequantization (GeoTessera format). quantized is int8 (H, W, 128); scales is (H, W) or (H, W, 128). Returns int8 * scale.

`load_tee_vectors(vector_dir) -> (vectors, coords, metadata)`

Read a TEE vector directory. Returns vectors: float32 (N, 128), coords: int32 (N, 2) pixel (x, y), metadata: dict. Raises FileNotFoundError if files are missing.

`load_geotessera_tile(embedding_path, scales_path) -> float32 (H, W, 128)`

Load + dequantize one GeoTessera tile from its .npy + _scales.npy pair.

`load_embeddings_for_shapefile(gdf, field, year, gt_instance, callback=None) -> (vectors, labels, class_names, stats)`

Stream embeddings for all pixels under a labelled GeoDataFrame, one GeoTessera tile at a time (memory-bounded). gdf is reprojected per tile; field is the label column. Returns vectors: float32 (N, 128), labels: int (N,) (0-indexed), class_names: list[str], stats: dict (tile_count, tiles_with_data, total_pixels, n_classes). callback(current, total) reports tile progress. Raises ValueError if no labelled pixels are found.

`load_embeddings_for_shapefile_vq(gdf, field, year, client, *, max_km=10.0, target_crs="EPSG:4326", callback=None) -> (vectors, labels, class_names, stats)`

The VQ data path: same output contract, but pulls reconstructed embeddings from client.fetch_mosaic_for_region(bbox, year, target_crs) -> (mosaic, transform, crs) instead of raw GeoTessera tiles. client is duck-typed — pass a tessera_vq.VQTessera (the VQ bolt-on) for the VQ path, or a geotessera.GeoTessera for raw region reads (not imported here, so no tessera-vq dependency). The shapefile bbox is split into <= max_km chunks (the bolt-on caps bbox size); chunks no polygon touches are skipped without a fetch, and chunks with no coverage are skipped with a warning. stats has chunk_count, chunks_with_data, total_pixels, n_classes. Use it to measure downstream accuracy on VQ-reconstructed vs. raw embeddings.

`tessera_eval.rasterize`

`rasterize_shapefile(gdf, field, transform, width, height, label_encoder=None) -> int32 (height, width)`

Burn polygons onto a pixel grid using field as the class label. Output is 1-based (0 = nodata, 1..K = classes). Pass a pre-fitted sklearn.preprocessing.LabelEncoder to keep class IDs consistent across tiles (all_touched=True is used so thin polygons aren't dropped).

`tessera_eval.classify`

`available_classifiers() -> list[str]`

["nn", "rf", "mlp", "spatial_mlp", "spatial_mlp_5x5"], plus "xgboost" if installed.

`make_classifier(name, params=None) -> estimator`

scikit-learn-compatible classifier by name. Names may carry a variant suffix (mlp_v2 → mlp). Recognized: nn (k-NN), rf (random forest), xgboost, mlp, spatial_mlp, spatial_mlp_5x5. params overrides hyperparameters (e.g. {"n_estimators": 300}, {"hidden_layers": "256,128"}). Estimators use random_state=42. Raises ValueError (unknown) / ImportError (xgboost missing).

`available_regressors() -> list[str]` / `make_regressor(name, params=None)`

As above for regression: nn_reg, rf_reg, mlp_reg, xgboost_reg.

`gather_spatial_features(vectors, coords, width, height, radius=1, subset_mask=None) -> float32 (M, w·w·dim)`

For each pixel, concatenate its embedding with its (2·radius+1)² grid neighbours (missing neighbours zero-filled). radius=1→3×3, 2→5×5. subset_mask restricts output to selected pixels. Operates on a sparse (coords) grid.

`gather_spatial_features_2d(tile_emb, radius=1, mask=None) -> float32`

Same idea for a contiguous (H, W, dim) tile (edge-padded). Returns (H, W, w·w·dim), or (M, w·w·dim) when a (H, W) boolean mask is given.

`augment_spatial(X, y, window, dim) -> (X_aug, y_aug)`

4× augmentation of spatial patches via horizontal/vertical flips.

`tessera_eval.evaluate`

`run_learning_curve(vectors, labels, classifier_names, training_pcts, repeats=5, ...) -> generator`

Yields events as it sweeps training_pcts (percentages), with stratified sampling and repeats random restarts:

{"type": "classifier_status", "message": str}
{"type": "progress", "pct": float, "classifiers": {name: {mean_f1, std_f1, mean_f1w, std_f1w}}, "pixel_train_count", "total_pixels", ...}
{"type": "confusion_matrices", "confusion_matrices": {name: [[int]]}} (at the largest pct)

Spatial hold-out: pass test_vectors + test_labels (a fixed, separate test set; vectors/labels become the train-only pool). Spatial-MLP variants take spatial_vectors / spatial_vectors_5x5; U-Net takes unet_patches.

`evaluate(vectors, labels, classifiers=None, training_sizes=None, max_train=10000, repeats=5, ...) -> Results`

Non-streaming convenience wrapper. Results has .summary() (formatted string), .to_dict(), .confusion_matrices, .training_sizes, .progress.

`run_kfold_cv(vectors, labels, model_names, k=5, task="classification", model_params=None, max_training_samples=None, seed=42) -> generator`

Stratified k-fold (classification) or k-fold (regression). Yields:

{"type": "fold_result", "fold": int, "models": {name: metrics}}
{"type": "aggregate", "models": {name: {mean_f1, std_f1, mean_f1w, std_f1w}}} (regression: mean_r2/std_r2/mean_rmse/.../mean_mae/...)
{"type": "confusion_matrices", ...} (classification only)

`regression_metrics(y_true, y_pred) -> {"r2", "rmse", "mae"}`

`detect_field_type(gdf, field_name, threshold=20) -> "classification" | "regression"`

Numeric with > threshold unique values → regression; otherwise classification.

`tessera_eval.unet` (requires `torch`)

`extract_labelled_patches(tile_emb, class_raster, patch_size=256, min_labelled=10) -> list[(emb_patch, label_patch)]`

Connected-component patch extraction centred on label clusters (edge-zero-padded).

`train_unet_on_patches(patches, n_classes, params=None, progress_callback=None) -> model`

Train a TinyUNet on (emb_patch, label_patch) pairs. params e.g. {"epochs": 15}.

`predict_unet_tile(model, tile_emb, patch_size=256, overlap=32) -> int (H, W)`

Sliding-window prediction over a tile; output classes are 1-based (0 = ignore).

TinyUNet (class) and _HAS_TORCH (bool) are also exposed. If torch is missing, the training/predict functions raise RuntimeError.

GeoTessera zarr access — moved out

The cached zarr handle and region reads (get_zarr, probe_zarr_coverage, read_region_chunked) now live in the standalone tessera-zarr-utils package (pip install "tessera-zarr-utils[geotessera]"), so they can be used without the eval/ML stack. The tee-compute server depends on it; see that package's README.

`tessera_eval.server`

tee-compute console entry point. See compute-server.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API reference

`tessera_eval.data`

`dequantize_uint8(quantized, dim_min, dim_max) -> float32`

`dequantize_int8(quantized, scales) -> float32 (H, W, 128)`

`load_tee_vectors(vector_dir) -> (vectors, coords, metadata)`

`load_geotessera_tile(embedding_path, scales_path) -> float32 (H, W, 128)`

`load_embeddings_for_shapefile(gdf, field, year, gt_instance, callback=None) -> (vectors, labels, class_names, stats)`

`load_embeddings_for_shapefile_vq(gdf, field, year, client, *, max_km=10.0, target_crs="EPSG:4326", callback=None) -> (vectors, labels, class_names, stats)`

`tessera_eval.rasterize`

`rasterize_shapefile(gdf, field, transform, width, height, label_encoder=None) -> int32 (height, width)`

`tessera_eval.classify`

`available_classifiers() -> list[str]`

`make_classifier(name, params=None) -> estimator`

`available_regressors() -> list[str]` / `make_regressor(name, params=None)`

`gather_spatial_features(vectors, coords, width, height, radius=1, subset_mask=None) -> float32 (M, w·w·dim)`

`gather_spatial_features_2d(tile_emb, radius=1, mask=None) -> float32`

`augment_spatial(X, y, window, dim) -> (X_aug, y_aug)`

`tessera_eval.evaluate`

`run_learning_curve(vectors, labels, classifier_names, training_pcts, repeats=5, ...) -> generator`

`evaluate(vectors, labels, classifiers=None, training_sizes=None, max_train=10000, repeats=5, ...) -> Results`

`run_kfold_cv(vectors, labels, model_names, k=5, task="classification", model_params=None, max_training_samples=None, seed=42) -> generator`

`regression_metrics(y_true, y_pred) -> {"r2", "rmse", "mae"}`

`detect_field_type(gdf, field_name, threshold=20) -> "classification" | "regression"`

`tessera_eval.unet` (requires `torch`)

`extract_labelled_patches(tile_emb, class_raster, patch_size=256, min_labelled=10) -> list[(emb_patch, label_patch)]`

`train_unet_on_patches(patches, n_classes, params=None, progress_callback=None) -> model`

`predict_unet_tile(model, tile_emb, patch_size=256, overlap=32) -> int (H, W)`

GeoTessera zarr access — moved out

`tessera_eval.server`

FilesExpand file tree

api-reference.md

Latest commit

History

api-reference.md

File metadata and controls

API reference

tessera_eval.data

dequantize_uint8(quantized, dim_min, dim_max) -> float32

dequantize_int8(quantized, scales) -> float32 (H, W, 128)

load_tee_vectors(vector_dir) -> (vectors, coords, metadata)

load_geotessera_tile(embedding_path, scales_path) -> float32 (H, W, 128)

load_embeddings_for_shapefile(gdf, field, year, gt_instance, callback=None) -> (vectors, labels, class_names, stats)

load_embeddings_for_shapefile_vq(gdf, field, year, client, *, max_km=10.0, target_crs="EPSG:4326", callback=None) -> (vectors, labels, class_names, stats)

tessera_eval.rasterize

rasterize_shapefile(gdf, field, transform, width, height, label_encoder=None) -> int32 (height, width)

tessera_eval.classify

available_classifiers() -> list[str]

make_classifier(name, params=None) -> estimator

available_regressors() -> list[str] / make_regressor(name, params=None)

gather_spatial_features(vectors, coords, width, height, radius=1, subset_mask=None) -> float32 (M, w·w·dim)

gather_spatial_features_2d(tile_emb, radius=1, mask=None) -> float32

augment_spatial(X, y, window, dim) -> (X_aug, y_aug)

tessera_eval.evaluate

run_learning_curve(vectors, labels, classifier_names, training_pcts, repeats=5, ...) -> generator

evaluate(vectors, labels, classifiers=None, training_sizes=None, max_train=10000, repeats=5, ...) -> Results

run_kfold_cv(vectors, labels, model_names, k=5, task="classification", model_params=None, max_training_samples=None, seed=42) -> generator

regression_metrics(y_true, y_pred) -> {"r2", "rmse", "mae"}

detect_field_type(gdf, field_name, threshold=20) -> "classification" | "regression"

tessera_eval.unet (requires torch)

extract_labelled_patches(tile_emb, class_raster, patch_size=256, min_labelled=10) -> list[(emb_patch, label_patch)]

train_unet_on_patches(patches, n_classes, params=None, progress_callback=None) -> model

predict_unet_tile(model, tile_emb, patch_size=256, overlap=32) -> int (H, W)

GeoTessera zarr access — moved out

tessera_eval.server

`tessera_eval.data`

`dequantize_uint8(quantized, dim_min, dim_max) -> float32`

`dequantize_int8(quantized, scales) -> float32 (H, W, 128)`

`load_tee_vectors(vector_dir) -> (vectors, coords, metadata)`

`load_geotessera_tile(embedding_path, scales_path) -> float32 (H, W, 128)`

`load_embeddings_for_shapefile(gdf, field, year, gt_instance, callback=None) -> (vectors, labels, class_names, stats)`

`load_embeddings_for_shapefile_vq(gdf, field, year, client, *, max_km=10.0, target_crs="EPSG:4326", callback=None) -> (vectors, labels, class_names, stats)`

`tessera_eval.rasterize`

`rasterize_shapefile(gdf, field, transform, width, height, label_encoder=None) -> int32 (height, width)`

`tessera_eval.classify`

`available_classifiers() -> list[str]`

`make_classifier(name, params=None) -> estimator`

`available_regressors() -> list[str]` / `make_regressor(name, params=None)`

`gather_spatial_features(vectors, coords, width, height, radius=1, subset_mask=None) -> float32 (M, w·w·dim)`

`gather_spatial_features_2d(tile_emb, radius=1, mask=None) -> float32`

`augment_spatial(X, y, window, dim) -> (X_aug, y_aug)`

`tessera_eval.evaluate`

`run_learning_curve(vectors, labels, classifier_names, training_pcts, repeats=5, ...) -> generator`

`evaluate(vectors, labels, classifiers=None, training_sizes=None, max_train=10000, repeats=5, ...) -> Results`

`run_kfold_cv(vectors, labels, model_names, k=5, task="classification", model_params=None, max_training_samples=None, seed=42) -> generator`

`regression_metrics(y_true, y_pred) -> {"r2", "rmse", "mae"}`

`detect_field_type(gdf, field_name, threshold=20) -> "classification" | "regression"`

`tessera_eval.unet` (requires `torch`)

`extract_labelled_patches(tile_emb, class_raster, patch_size=256, min_labelled=10) -> list[(emb_patch, label_patch)]`

`train_unet_on_patches(patches, n_classes, params=None, progress_callback=None) -> model`

`predict_unet_tile(model, tile_emb, patch_size=256, overlap=32) -> int (H, W)`

`tessera_eval.server`