Archive your Google Drive documents and generate an interconnected Obsidian knowledge graph. The personal counterpart to opsidian_graph — together they map two sides of yourself: personal (Drive) and professional (Git/JIRA/Confluence).
drive_og connects to your Google Drive, extracts text from documents (Docs, Sheets, Slides, PDFs), categorizes them by topic, and generates an Obsidian vault where everything is linked by folder, topic, and time.
Google Drive Obsidian Vault
┌─────────────────┐ ┌──────────────────────────┐
│ University/ │ drive_og │ GoogleDrive/University/ │
│ 2023/ │ ──────────> │ Folders/University--2023 │
│ ML-Notes.doc │ sync │ Topics/Machine-Learning │
│ Textbook.pdf │ │ Weekly/2024-W03 │
│ Finance/ │ │ Monthly/2024-01 │
│ Budget.xlsx │ │ Dashboard.md │
└─────────────────┘ │ Expertise.md │
└──────────────────────────┘
Open the vault in Obsidian and explore the graph view. Install the 3D Graph community plugin for a 3D visualization.
- Go to Google Cloud Console
- Create a project (or select existing), then enable Google Drive API under APIs & Services > Library
- Go to APIs & Services > OAuth consent screen:
- Choose "External" user type
- Fill in app name and your email
- Add scope:
https://www.googleapis.com/auth/drive.readonly - Important: Under "Test users", click + ADD USERS and add your own Gmail address (the app is in Testing mode, so only listed test users can log in)
- Go to APIs & Services > Credentials:
- Click + CREATE CREDENTIALS > OAuth client ID
- Application type: Desktop app
- Copy the Client ID and Client Secret
# Copy example configs
cp config/drives.yaml.example config/drives.yaml
cp config/topics.yaml.example config/topics.yaml
# Add your credentials
cat > .env << 'EOF'
GOOGLE_CLIENT_ID=your-client-id-here
GOOGLE_CLIENT_SECRET=your-client-secret-here
ANTHROPIC_API_KEY=sk-ant-... # Optional: enables LLM topic categorization
EOFEdit config/drives.yaml to specify which folders to scan:
folders:
- id: "root" # Scan entire My Drive
name: "My Drive"
color: "#50C878"
- id: "1aBcDeFgHiJkLmNoPqR" # Or specific folder IDs
name: "University"
color: "#4A90D9"
extraction:
shallow_threshold_mb: 1 # Files above this get partial extraction
shallow_max_pages: 5 # Pages to extract from large files (e.g., textbooks)Define topic patterns in config/topics.yaml:
topics:
Machine Learning:
- "\\bml\\b"
- "\\bdeep.learning"
Finance:
- "\\bbudget\\b"
- "\\brevenue\\b"pip install -e ../opsidian_core # Install shared library
pip install -e . # Install drive_og
drive-og init # Authenticate with Google (opens browser)
drive-og sync # Fetch docs, extract text, generate vaultOpen the vault/ directory as an Obsidian vault.
| Command | Description |
|---|---|
drive-og init [--reauth] |
OAuth authentication + discover top-level folders |
drive-og sync [--full] [--no-llm] |
Fetch documents, cache, and generate vault |
drive-og generate [--no-llm] |
Rebuild vault from local cache (offline, no API calls) |
Flags:
--full— re-fetch all documents (ignores incremental sync state)--no-llm— skip Claude-based topic categorization (keyword-only)--reauth— force re-authentication even if token exists-v/--verbose— debug logging
vault/
├── Dashboard.md # Stats, recent activity, quick links
├── Expertise.md # Auto-generated interest/knowledge profile
│
├── GoogleDrive/ # One .md per document, mirroring folder structure
│ ├── University/
│ │ ├── 2023/
│ │ │ ├── ML-Lecture-Notes.md
│ │ │ └── Deep-Learning-Textbook.md (shallow extraction)
│ │ └── 2024/
│ │ └── Thesis-Draft.md
│ ├── Projects/
│ │ └── Side-Project-Spec.md
│ └── Finance/
│ └── Budget-2024.md
│
├── Folders/ # Map of Content per folder
│ ├── University.md
│ ├── University--2023.md # Nested folders use -- separator
│ ├── Projects.md
│ └── Finance.md
│
├── Topics/ # Cross-cutting topic MOCs
│ ├── Machine-Learning.md # Links ALL ML docs across folders
│ ├── Finance.md
│ └── Research.md
│
├── Weekly/ # Activity by ISO week
│ └── 2024-W03.md
├── Monthly/ # Monthly rollups
│ └── 2024-01.md
│
└── .obsidian/
└── graph.json # Color-coding for graph view
Three dimensions of wikilinks create a rich knowledge graph:
- Folder links —
[[University--2023]]— preserves your original Drive hierarchy - Topic links —
[[Machine Learning]]— connects related docs across folders - Temporal links —
[[2024-W03]]— shows when you worked on what
A document about ML in University/2023/ and another in Projects/ both link to [[Machine Learning]], connecting them even though they live in different folders.
| File Type | Method | Notes |
|---|---|---|
| Google Docs | Export as plain text | Full text via Drive API |
| Google Sheets | Export as CSV | Tabular data preserved |
| Google Slides | Export as plain text | Slide content extracted |
| PDFs | pypdf text extraction | Handles multi-page documents |
Large files (e.g., university textbooks) are handled intelligently:
- Small files (<1MB default): full text extraction
- Large files (>1MB): first N pages only (
shallow_max_pagesin config) - Near-empty extraction (<50 chars, e.g., scanned PDFs): metadata only
The extraction field in each note's frontmatter records which mode was used (full, shallow, or metadata_only).
Priority stack (highest wins):
- Keyword match — regex patterns from
config/topics.yaml - LLM fallback — Claude Haiku classifies unmatched text (requires
ANTHROPIC_API_KEY) - Default —
"Uncategorized"
Documents can have multiple topics. When keyword and LLM disagree, keyword wins (user-defined intent).
drive-og sync only fetches documents modified since the last sync. State is tracked in .drive_og_state.json.
- First run: fetches everything
- Subsequent runs: only changed documents
--full: re-fetches everything regardless
The local cache (cache/) is the source of truth for vault generation. You can edit cache JSON files manually, then run drive-og generate to rebuild.
drive_og/
├── src/drive_og/
│ ├── cli.py # CLI entry point (init, sync, generate)
│ ├── models.py # GoogleDriveDocument dataclass
│ ├── config.py # drives.yaml + topics.yaml loading
│ ├── auth.py # OAuth 2.0 desktop flow
│ ├── gdrive_client.py # Drive API: list files, resolve paths
│ └── content_extractor.py # Size-tiered text extraction
├── config/
│ ├── drives.yaml.example # Template for user configuration
│ └── topics.yaml.example # Template for topic patterns
├── templates/
│ ├── gdrive_note.md.j2 # Per-document note template
│ └── folder_moc.md.j2 # Folder Map of Content template
├── tests/ # 17 tests (unit + integration)
├── docs/superpowers/
│ ├── specs/ # Design specification
│ └── plans/ # Implementation plan
└── pyproject.toml
- opsidian_core — shared library (cache, categorizer, vault writer, etc.)
google-api-python-client— Google Drive APIgoogle-auth-oauthlib— OAuth 2.0 authenticationpypdf— PDF text extractionanthropic— Claude API for LLM categorization (optional)
drive_og is part of the opsidian knowledge graph ecosystem:
┌─────────────────────┐
│ opsidian_core │
│ (shared library) │
└──┬────────┬────────┬─┘
│ │ │
┌────────┘ │ └────────┐
v v v
opsidian_graph drive_og opsidian_meta
(work self) (personal self) (unified analysis)
Git/JIRA/Confluence Google Drive reads both caches
│ │ │
v v v
work vault/ personal vault/ meta vault/
(timeline, focus
reports, gaps)
| Project | What it does |
|---|---|
| opsidian_core | Shared library for all graph generators |
| opsidian_graph | Work knowledge graph (GitHub PRs, JIRA, Confluence) |
| drive_og | Personal knowledge graph (Google Drive) |
| opsidian_meta | Unified productivity analysis (timeline, focus reports, gap detection) |
After syncing with drive_og, you can run opsidian_meta to generate cross-domain productivity reports that combine your work and personal activity.
pip install -e ../opsidian_core
pip install -e .
python -m pytest tests/ -v17 tests covering models, auth, config, client, extractor, CLI, and end-to-end integration.
MIT