alpha-bhu is a Python toolkit for clustering and classifying AlphaEarth Foundation embedding rasters for land-use analysis.
The repository is built with nbdev and jupytext: the checked-in source of truth lives in the Jupytext Markdown files under nbs/, notebook files are paired/generated on demand, and the importable package is generated into alpha_bhu/.
- Loads 64-band AlphaEarth Foundation embedding rasters from GeoTIFF/COG files.
- Reshapes and validates embeddings for clustering workflows.
- Runs FAISS-based clustering across multiple
kvalues. - Evaluates segmentations with nesting and spatial quality metrics.
- Organizes multiple segmentations with the
SegSetabstraction. - Assigns colors and exports cluster rasters plus legends for downstream mapping.
alpha_bhu/: generated Python package.nbs/: checked-in Jupytext Markdown notebook sources for nbdev.data/: local example data, includingaef_3.5k_roi_cog.tif.pyproject.toml: package metadata and tool configuration.settings.ini: nbdev project settings.
- Python
>=3.12 uvrecommended for environment management
Core dependencies include faiss-cpu, rasterio, geopandas, numpy, polars, scikit-learn, altair, and ipyleaflet.
Create the environment and install the package with development dependencies:
uv sync --extra devIf you only want the runtime package:
uv syncThis is the highest-level workflow currently exposed by the package:
from pathlib import Path
from alpha_bhu.segset_workflow import SegSetWorkflow
cog_path = Path("data/aef_3.5k_roi_cog.tif")
workflow = SegSetWorkflow.from_cog(cog_path)
best_k = workflow.select_optimal_k(
low_k_range=[8, 10, 12, 15, 18, 20],
high_k_range=[40, 50, 60, 70, 80, 90],
)
results = workflow.export_results(Path("outputs"))
print("Best k:", best_k)
print("Exported files:", results["exported_files"])For lower-level usage:
from pathlib import Path
from alpha_bhu.data import load_aef_embeddings, reshape_for_clustering
from alpha_bhu.segset import SegSet
embeddings, metadata = load_aef_embeddings(Path("data/aef_3.5k_roi_cog.tif"))
embeddings_flat = reshape_for_clustering(embeddings)
segset = SegSet.from_embeddings(embeddings_flat, metadata["shape"])
segset = segset.with_kmeans_range([8, 10, 12], random_state=42, verbose=True)
quality = segset.spatial_quality("k10_s42")
print(quality)alpha_bhu/data.py: raster loading, reshaping, embedding validation.alpha_bhu/clustering.py: FAISS clustering and nesting analysis helpers.alpha_bhu/cluster_qa.py: spatial quality checks for cluster rasters.alpha_bhu/segset.py: immutable segmentation collection and workflow helpers.alpha_bhu/segset_workflow.py: end-to-end orchestration for k-selection and export.alpha_bhu/export.py: GeoTIFF and legend export utilities.alpha_bhu/land_cover.py: land-cover core extraction for manual labeling workflows.
The repository currently includes local sample assets under data/, including:
data/aef_3.5k_roi_cog.tifdata/cluster_animation/
Because this project uses nbdev with Jupytext pairing (ipynb,md), edit the Markdown notebook sources in nbs/, not the generated files in alpha_bhu/.
In practice, the workflow is:
- keep the Jupytext Markdown files in
nbs/under version control - generate or sync notebook
.ipynbfiles when needed for notebook work - export Python modules from the notebook sources with
nbdev - run
blacken-docson the checked-in Markdown sources, not on.ipynbnotebooks
Typical development loop:
uv sync --extra dev
uv run nbdev_export
uv run nbdev_test
uv run blacken-docs .
uv run ruff check .
uv run mypy alpha_bhuUseful commands:
uv run jupytext --sync nbs/*.md # sync paired notebook files on demand
uv run nbdev_export # regenerate package code from notebooks
uv run nbdev_docs # build docs into _docs/
uv run blacken-docs README.md nbs/*.md # format code examples in checked-in Markdown sources
uv run jupyter lab # work directly in notebooksThis repository is still in an early stage:
- package metadata is alpha-quality
- tests are not yet set up as a first-class workflow
- some features are notebook-oriented and assume local data availability
Apache 2.0. See LICENSE.