Skip to content

Add DB backup endpoint dumping CSVs to Google Drive#18

Draft
aboryslawski wants to merge 1 commit into
LunarLogic:mainfrom
aboryslawski:feat/db-backup-endpoint
Draft

Add DB backup endpoint dumping CSVs to Google Drive#18
aboryslawski wants to merge 1 commit into
LunarLogic:mainfrom
aboryslawski:feat/db-backup-endpoint

Conversation

@aboryslawski
Copy link
Copy Markdown

@aboryslawski aboryslawski commented May 14, 2026

Description

Please include a short summary of the changes.

Please include relevant motivation and context.

Add a screenshot and/or video of interface changes if any.

Checklist BEFORE requesting a review

  • I have performed a self-review of my code
  • I have added tests (Rails: rspec, JS: Jest, Feature: E2E)
  • I have added API documentation to swagger (If there are changes to API)
  • I have updated docs/readme if necessary
  • I have checked compliance with standardrb for ruby code (run standardb --fix).
  • I have checked compliance with prettier for js code (run yarn prettier app/assets app/javascript --check)
  • I have localized translations (if there are changes to the UI)

Relevant task

Trello card (link)

@rusilko
Copy link
Copy Markdown
Contributor

rusilko commented May 15, 2026

Token-auth: production-safety checklist

The BackupTokenAuthenticatable concern is a solid baseline (timing-safe secure_compare, bytesize pre-check, fails closed when env is unset). Before relying on it in production, a few things to button up:

  • Enforce token strength. The code accepts any non-empty BACKUP_API_TOKEN. A short or guessable value is brute-forceable. Document a minimum (e.g. 32 random bytes, SecureRandom.hex(32)) in the deploy notes and/or add a length check on boot.
  • Rate limiting. No throttling on POST /api/v1/backups. Auth failures return fast so it's not a DB-dump DoS, but unauthenticated request floods still hit the app. Consider Rack::Attack or an ingress-level limit.
  • Audit log. Currently no record of who triggered a backup or when. Add a log line (timestamp, source IP, success/failure) so incidents are traceable.
  • Header logging. Rails filters params by default, not headers. Verify X-Backup-Token doesn't end up in Rails logs, ingress/proxy logs, or APM traces. If it can, add the header to the filter list.
  • HTTPS. The concern doesn't enforce TLS itself — it relies on force_ssl / ingress. Confirm production terminates only via HTTPS for this route.
  • Replay / rotation. A captured token works forever until rotated. Either commit to a rotation cadence in ops docs, or upgrade to HMAC + timestamp if higher assurance is needed later.

None of these are blockers for shipping behind a strong token + HTTPS, but worth tracking before this endpoint sees real traffic.

@rusilko
Copy link
Copy Markdown
Contributor

rusilko commented May 15, 2026

Code & scope review

Follow-up to the security checklist. Grouped roughly by impact.

Scope vs original request

  • Coverage This PR cleanly solves the Nov 22 Slack ask ("get a backup file onto our side"). It does not yet cover Justyna's Jan 8 follow-up, which is the operational one: a usable file for offline trip prep with locations (incl. coordinates) and assigned people, refreshed roughly monthly, that ops can open if the app is down before a trip. Right now we ship raw per-table CSVs for the whole DB as "surowy zrzut" — hard to actually work from. Suggest adding a second service like Backups::TripPrepExport that emits a single joined xlsx (location | coords | assigned people).

Operational

  • Sync work in the controller BackupsController#create runs sequential Drive uploads run inline in the request. On a real DB (~30 tables × one upload each) this will likely take 30–90s and tie up a Puma worker, with risk of ALB / ingress timeouts as the DB grows. Suggest: keep the endpoint as a manual trigger (e.g. a button in the admin panel for ad-hoc backups) and add a rake backup task that wraps the same services, scheduled via the simplest AWS option (ECS Scheduled Task, EventBridge cron, or plain crontab on the host).

  • No retention / cleanup. Drive folder grows unbounded — every run creates a new backup_<timestamp>_UTC/ with no expiration. Decide a policy (keep last N, or delete older than X days/months) and either implement it in the service or document it as an ops responsibility.

Data fidelity

  • Raw connection.execute bypasses ActiveRecord. DumpAllTables reads tables via SELECT *, so values come out as raw DB representations — enums as their stored form, JSON columns as serialized text, encrypted attributes as ciphertext, NULL indistinguishable from empty string in CSV. Fine if the intent is a forensic snapshot, but it won't round-trip cleanly back into AR models on restore. If the disaster-recovery file from item 1 needs to be human-readable or programmatically restorable, prefer Model.find_each with as_json / explicit serialization.

Memory (informational)

  • DumpAllTables loads every table fully into memory (a hash of CSV strings), then UploadToGoogleDrive holds the whole hash while uploading sequentially. Likely moot once the scope shifts to dumping only what Justyna actually needs (locations, people, animals) per the first point, but flagging in case the all-tables path stays. Cheap fix if it does: yield per-table → upload → release, instead of building the full hash first.

Tests

  • Backups::UploadToGoogleDrive has no dedicated spec. It's stubbed at the request-spec level and DumpAllTables has its own unit test, but the upload service's logic — env-var guard, subfolder naming, content type, iteration — is untested. Add a spec with a fake session double covering at least the GOOGLE_DRIVE_BACKUP_FOLDER_ID-missing case and the happy path.

Minor / nits

  • Time.now.utcTime.current. In UploadToGoogleDrive#call, Rails convention is Time.current (or Time.zone.now) so the app's configured zone is respected. Practically identical here since the format string is _UTC-suffixed anyway, but matches the rest of the codebase.

  • Route is singular. resource :backup, only: :createPOST /api/v1/backup. Conceptually each call creates one of many backups, so resources :backups, only: :createPOST /api/v1/backups reads more naturally and matches the plural convention used elsewhere in the API. Bikeshed-grade.

  • Google::DriveSession config path is relative to CWD. File.read("google_drive_client_config.json.erb") resolves against the working directory, not Rails.root. Works for bin/dev and most invocations, but fragile if the service is ever called from a rake task run in a different CWD, a console started in a subdirectory, or a background runner. One-line fix: File.read(Rails.root.join("google_drive_client_config.json.erb")).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants