fix: cache album ID in Firestore — prevent duplicate Family Faces albums#20
fix: cache album ID in Firestore — prevent duplicate Family Faces albums#20
Conversation
…lbums Root cause: appendonly scope cannot list albums (GET /v1/albums returns 403). The old ensure_album silently failed the list, then created a new album on every Cloud Run invocation — one new 'Family Faces' album per hour. Fix: - ensure_album: remove broken list logic, create-only with clear docstring - get_or_create_album_id: new function that checks Firestore cache first (dmaf_config/google_photos_album), creates once, caches the ID - __main__.py: use get_or_create_album_id instead of ensure_album - Tests: replace list-based tests with create-only + cache hit/miss tests
There was a problem hiding this comment.
Pull request overview
This PR fixes repeated creation of duplicate Google Photos albums under the photoslibrary.appendonly scope by caching the created album ID in Firestore and reusing it across Cloud Run invocations.
Changes:
- Make
ensure_albumcreate-only (remove broken album listing logic under appendonly scope) and document the limitation. - Add
get_or_create_album_idto read/write a cached album ID in Firestore, creating the album only on first run (or when the configured name changes). - Update the entrypoint and tests to use and validate the Firestore-cached album behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/dmaf/google_photos/api.py |
Removes album listing from ensure_album and introduces Firestore-backed get_or_create_album_id. |
src/dmaf/__main__.py |
Switches album selection to get_or_create_album_id during startup. |
src/dmaf/google_photos/__init__.py |
Exposes get_or_create_album_id as part of the public google_photos API. |
tests/test_google_photos_api.py |
Updates album tests for create-only behavior and adds tests for the Firestore cache logic. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/dmaf/google_photos/api.py
Outdated
| from google.cloud import firestore as _fs | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
| db = _fs.Client(project=firestore_project) | ||
| ref = db.collection(firestore_collection).document("google_photos_album") | ||
|
|
||
| doc = ref.get() | ||
| if doc.exists: | ||
| data = doc.to_dict() or {} | ||
| cached_id = data.get("album_id") | ||
| cached_name = data.get("album_name") | ||
| if cached_id and cached_name == album_name: | ||
| logger.debug(f"Using cached album ID for '{album_name}': {cached_id[:12]}...") | ||
| return str(cached_id) | ||
| # Album name changed — fall through to create a new one | ||
|
|
||
| album_id = ensure_album(creds, album_name) | ||
| if album_id: | ||
| ref.set({ | ||
| "album_name": album_name, | ||
| "album_id": album_id, | ||
| "created_at": _fs.SERVER_TIMESTAMP, | ||
| }) | ||
| logger.info(f"Created and cached Google Photos album '{album_name}' -> {album_id[:12]}...") | ||
| return album_id | ||
|
|
||
|
|
There was a problem hiding this comment.
get_or_create_album_id imports google.cloud.firestore unconditionally. google-cloud-firestore is not part of the default or .[dev] dependencies, so this will raise ImportError at runtime (and in CI) even when users just want album uploads. Consider wrapping the Firestore import/client creation in a try/except ImportError and falling back to ensure_album (or raising a clear, actionable error telling users to install google-cloud-firestore).
| from google.cloud import firestore as _fs | |
| logger = logging.getLogger(__name__) | |
| db = _fs.Client(project=firestore_project) | |
| ref = db.collection(firestore_collection).document("google_photos_album") | |
| doc = ref.get() | |
| if doc.exists: | |
| data = doc.to_dict() or {} | |
| cached_id = data.get("album_id") | |
| cached_name = data.get("album_name") | |
| if cached_id and cached_name == album_name: | |
| logger.debug(f"Using cached album ID for '{album_name}': {cached_id[:12]}...") | |
| return str(cached_id) | |
| # Album name changed — fall through to create a new one | |
| album_id = ensure_album(creds, album_name) | |
| if album_id: | |
| ref.set({ | |
| "album_name": album_name, | |
| "album_id": album_id, | |
| "created_at": _fs.SERVER_TIMESTAMP, | |
| }) | |
| logger.info(f"Created and cached Google Photos album '{album_name}' -> {album_id[:12]}...") | |
| return album_id | |
| logger = logging.getLogger(__name__) | |
| try: | |
| from google.cloud import firestore as _fs | |
| db = _fs.Client(project=firestore_project) | |
| ref = db.collection(firestore_collection).document("google_photos_album") | |
| doc = ref.get() | |
| if doc.exists: | |
| data = doc.to_dict() or {} | |
| cached_id = data.get("album_id") | |
| cached_name = data.get("album_name") | |
| if cached_id and cached_name == album_name: | |
| logger.debug( | |
| f"Using cached album ID for '{album_name}': {cached_id[:12]}..." | |
| ) | |
| return str(cached_id) | |
| # Album name changed — fall through to create a new one | |
| album_id = ensure_album(creds, album_name) | |
| if album_id: | |
| ref.set( | |
| { | |
| "album_name": album_name, | |
| "album_id": album_id, | |
| "created_at": _fs.SERVER_TIMESTAMP, | |
| } | |
| ) | |
| logger.info( | |
| f"Created and cached Google Photos album '{album_name}' -> {album_id[:12]}..." | |
| ) | |
| return album_id | |
| except ImportError: | |
| # Firestore is optional; fall back to non-cached album creation and | |
| # emit a clear, actionable warning for users who want caching. | |
| logger.warning( | |
| "google-cloud-firestore is not installed; skipping Google Photos album " | |
| "ID caching. Install 'google-cloud-firestore' to enable caching." | |
| ) | |
| return ensure_album(creds, album_name) |
tests/test_google_photos_api.py
Outdated
| @patch("dmaf.google_photos.api.requests.post") | ||
| def test_api_error_no_albums_returned(self, mock_post, mock_get): | ||
| """Test handling API response without albums key.""" | ||
| @patch("google.cloud.firestore.Client") | ||
| def test_returns_cached_id(self, mock_fs_client, mock_post): |
There was a problem hiding this comment.
The tests patch google.cloud.firestore.Client, but CI installs dmaf.[dev] which does not include google-cloud-firestore. If that dependency isn’t installed, these tests will error before running. Consider using pytest.importorskip("google.cloud.firestore") / conditional skipping, or patching via sys.modules so the tests don’t require the external Firestore package.
src/dmaf/__main__.py
Outdated
| album_id = get_or_create_album_id( | ||
| creds, | ||
| settings.google_photos_album_name, | ||
| firestore_project=settings.dedup.firestore_project, |
There was a problem hiding this comment.
settings.dedup.firestore_project is optional when dedup.backend is sqlite, but it’s passed into get_or_create_album_id unconditionally. This can break local runs (album upload silently disabled via the broad exception handler) unless Firestore is configured. Consider using Firestore caching only when dedup.backend == "firestore" (and firestore_project is set), otherwise fall back to ensure_album or require an explicit Firestore project for album caching.
| album_id = get_or_create_album_id( | |
| creds, | |
| settings.google_photos_album_name, | |
| firestore_project=settings.dedup.firestore_project, | |
| album_kwargs: dict = {} | |
| if ( | |
| getattr(settings, "dedup", None) is not None | |
| and getattr(settings.dedup, "backend", None) == "firestore" | |
| and getattr(settings.dedup, "firestore_project", None) | |
| ): | |
| album_kwargs["firestore_project"] = settings.dedup.firestore_project | |
| album_id = get_or_create_album_id( | |
| creds, | |
| settings.google_photos_album_name, | |
| **album_kwargs, |
src/dmaf/__main__.py
Outdated
| from dmaf.database import get_database | ||
| from dmaf.face_recognition import best_match, load_known_faces | ||
| from dmaf.google_photos import create_media_item, ensure_album, get_creds, upload_bytes | ||
| from dmaf.google_photos import create_media_item, ensure_album, get_creds, get_or_create_album_id, upload_bytes |
There was a problem hiding this comment.
This import line likely exceeds the repo’s Ruff line-length limit (100) and can fail CI linting. Please wrap it using parentheses and one imported name per line (or otherwise format to <= 100 chars).
| from dmaf.google_photos import create_media_item, ensure_album, get_creds, get_or_create_album_id, upload_bytes | |
| from dmaf.google_photos import ( | |
| create_media_item, | |
| ensure_album, | |
| get_creds, | |
| get_or_create_album_id, | |
| upload_bytes, | |
| ) |
…d-firestore installed The old tests patched google.cloud.firestore.Client directly — this fails in CI's base test matrix where google-cloud-firestore is not installed. Extract _firestore_client(project) into a thin helper that owns the local import, then mock dmaf.google_photos.api._firestore_client in tests instead. No package install required in test environments.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Problem
appendonlyscope (DMAF's Google Photos scope) cannot list albums —GET /v1/albumsreturns 403. The oldensure_albumsilently failed the list check, then fell through to create a new album on every Cloud Run invocation. Cloud Run runs hourly → one new "Family Faces" album per hour.Fix
ensure_album: removed the broken list GET. Now create-only, with a clear docstring warning callers to cache the result.get_or_create_album_id: new function. Checksdmaf_config/google_photos_albumin Firestore first. Returns cached ID if album name matches. Creates + caches only on first run (or if album name changes in config).__main__.py: usesget_or_create_album_idinstead ofensure_album.After this fix
Tests
TestEnsureAlbum: updated to test create-only behaviourTestGetOrCreateAlbumId: 3 new tests — cache hit, cache miss (first run), album name change