Skip to content

cyberchitta/alpha-bhu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

alpha-bhu

alpha-bhu is a Python toolkit for clustering and classifying AlphaEarth Foundation embedding rasters for land-use analysis.

The repository is built with nbdev and jupytext: the checked-in source of truth lives in the Jupytext Markdown files under nbs/, notebook files are paired/generated on demand, and the importable package is generated into alpha_bhu/.

What it does

  • Loads 64-band AlphaEarth Foundation embedding rasters from GeoTIFF/COG files.
  • Reshapes and validates embeddings for clustering workflows.
  • Runs FAISS-based clustering across multiple k values.
  • Evaluates segmentations with nesting and spatial quality metrics.
  • Organizes multiple segmentations with the SegSet abstraction.
  • Assigns colors and exports cluster rasters plus legends for downstream mapping.

Repository layout

  • alpha_bhu/: generated Python package.
  • nbs/: checked-in Jupytext Markdown notebook sources for nbdev.
  • data/: local example data, including aef_3.5k_roi_cog.tif.
  • pyproject.toml: package metadata and tool configuration.
  • settings.ini: nbdev project settings.

Requirements

  • Python >=3.12
  • uv recommended for environment management

Core dependencies include faiss-cpu, rasterio, geopandas, numpy, polars, scikit-learn, altair, and ipyleaflet.

Installation

Create the environment and install the package with development dependencies:

uv sync --extra dev

If you only want the runtime package:

uv sync

Quick start

This is the highest-level workflow currently exposed by the package:

from pathlib import Path

from alpha_bhu.segset_workflow import SegSetWorkflow

cog_path = Path("data/aef_3.5k_roi_cog.tif")

workflow = SegSetWorkflow.from_cog(cog_path)
best_k = workflow.select_optimal_k(
    low_k_range=[8, 10, 12, 15, 18, 20],
    high_k_range=[40, 50, 60, 70, 80, 90],
)

results = workflow.export_results(Path("outputs"))
print("Best k:", best_k)
print("Exported files:", results["exported_files"])

For lower-level usage:

from pathlib import Path

from alpha_bhu.data import load_aef_embeddings, reshape_for_clustering
from alpha_bhu.segset import SegSet

embeddings, metadata = load_aef_embeddings(Path("data/aef_3.5k_roi_cog.tif"))
embeddings_flat = reshape_for_clustering(embeddings)

segset = SegSet.from_embeddings(embeddings_flat, metadata["shape"])
segset = segset.with_kmeans_range([8, 10, 12], random_state=42, verbose=True)

quality = segset.spatial_quality("k10_s42")
print(quality)

Main modules

Data notes

The repository currently includes local sample assets under data/, including:

  • data/aef_3.5k_roi_cog.tif
  • data/cluster_animation/

Development workflow

Because this project uses nbdev with Jupytext pairing (ipynb,md), edit the Markdown notebook sources in nbs/, not the generated files in alpha_bhu/.

In practice, the workflow is:

  • keep the Jupytext Markdown files in nbs/ under version control
  • generate or sync notebook .ipynb files when needed for notebook work
  • export Python modules from the notebook sources with nbdev
  • run blacken-docs on the checked-in Markdown sources, not on .ipynb notebooks

Typical development loop:

uv sync --extra dev
uv run nbdev_export
uv run nbdev_test
uv run blacken-docs .
uv run ruff check .
uv run mypy alpha_bhu

Useful commands:

uv run jupytext --sync nbs/*.md   # sync paired notebook files on demand
uv run nbdev_export          # regenerate package code from notebooks
uv run nbdev_docs            # build docs into _docs/
uv run blacken-docs README.md nbs/*.md   # format code examples in checked-in Markdown sources
uv run jupyter lab           # work directly in notebooks

Current status

This repository is still in an early stage:

  • package metadata is alpha-quality
  • tests are not yet set up as a first-class workflow
  • some features are notebook-oriented and assume local data availability

License

Apache 2.0. See LICENSE.

About

Classifiers from AlphaEarth embeddings for land use analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors