drive_og

Archive your Google Drive documents and generate an interconnected Obsidian knowledge graph. The personal counterpart to opsidian_graph — together they map two sides of yourself: personal (Drive) and professional (Git/JIRA/Confluence).

What It Does

drive_og connects to your Google Drive, extracts text from documents (Docs, Sheets, Slides, PDFs), categorizes them by topic, and generates an Obsidian vault where everything is linked by folder, topic, and time.

Google Drive                          Obsidian Vault
┌─────────────────┐                  ┌──────────────────────────┐
│ University/      │    drive_og     │ GoogleDrive/University/  │
│   2023/          │  ──────────>    │ Folders/University--2023 │
│     ML-Notes.doc │   sync         │ Topics/Machine-Learning  │
│     Textbook.pdf │                │ Weekly/2024-W03          │
│ Finance/         │                │ Monthly/2024-01          │
│   Budget.xlsx    │                │ Dashboard.md             │
└─────────────────┘                  │ Expertise.md             │
                                     └──────────────────────────┘

Open the vault in Obsidian and explore the graph view. Install the 3D Graph community plugin for a 3D visualization.

Quick Start

1. Set Up Google API Credentials

Go to Google Cloud Console
Create a project (or select existing), then enable Google Drive API under APIs & Services > Library
Go to APIs & Services > OAuth consent screen:
- Choose "External" user type
- Fill in app name and your email
- Add scope: https://www.googleapis.com/auth/drive.readonly
- Important: Under "Test users", click + ADD USERS and add your own Gmail address (the app is in Testing mode, so only listed test users can log in)
Go to APIs & Services > Credentials:
- Click + CREATE CREDENTIALS > OAuth client ID
- Application type: Desktop app
- Copy the Client ID and Client Secret

2. Configure

# Copy example configs
cp config/drives.yaml.example config/drives.yaml
cp config/topics.yaml.example config/topics.yaml

# Add your credentials
cat > .env << 'EOF'
GOOGLE_CLIENT_ID=your-client-id-here
GOOGLE_CLIENT_SECRET=your-client-secret-here
ANTHROPIC_API_KEY=sk-ant-...   # Optional: enables LLM topic categorization
EOF

Edit config/drives.yaml to specify which folders to scan:

folders:
  - id: "root"                    # Scan entire My Drive
    name: "My Drive"
    color: "#50C878"
  - id: "1aBcDeFgHiJkLmNoPqR"    # Or specific folder IDs
    name: "University"
    color: "#4A90D9"

extraction:
  shallow_threshold_mb: 1         # Files above this get partial extraction
  shallow_max_pages: 5            # Pages to extract from large files (e.g., textbooks)

Define topic patterns in config/topics.yaml:

topics:
  Machine Learning:
    - "\\bml\\b"
    - "\\bdeep.learning"
  Finance:
    - "\\bbudget\\b"
    - "\\brevenue\\b"

3. Install and Run

pip install -e ../opsidian_core   # Install shared library
pip install -e .                   # Install drive_og

drive-og init                      # Authenticate with Google (opens browser)
drive-og sync                      # Fetch docs, extract text, generate vault

4. Open in Obsidian

Open the vault/ directory as an Obsidian vault.

Commands

Command	Description
`drive-og init [--reauth]`	OAuth authentication + discover top-level folders
`drive-og sync [--full] [--no-llm]`	Fetch documents, cache, and generate vault
`drive-og generate [--no-llm]`	Rebuild vault from local cache (offline, no API calls)

Flags:

--full — re-fetch all documents (ignores incremental sync state)
--no-llm — skip Claude-based topic categorization (keyword-only)
--reauth — force re-authentication even if token exists
-v / --verbose — debug logging

Generated Vault Structure

vault/
├── Dashboard.md                 # Stats, recent activity, quick links
├── Expertise.md                 # Auto-generated interest/knowledge profile
│
├── GoogleDrive/                 # One .md per document, mirroring folder structure
│   ├── University/
│   │   ├── 2023/
│   │   │   ├── ML-Lecture-Notes.md
│   │   │   └── Deep-Learning-Textbook.md   (shallow extraction)
│   │   └── 2024/
│   │       └── Thesis-Draft.md
│   ├── Projects/
│   │   └── Side-Project-Spec.md
│   └── Finance/
│       └── Budget-2024.md
│
├── Folders/                     # Map of Content per folder
│   ├── University.md
│   ├── University--2023.md      # Nested folders use -- separator
│   ├── Projects.md
│   └── Finance.md
│
├── Topics/                      # Cross-cutting topic MOCs
│   ├── Machine-Learning.md      # Links ALL ML docs across folders
│   ├── Finance.md
│   └── Research.md
│
├── Weekly/                      # Activity by ISO week
│   └── 2024-W03.md
├── Monthly/                     # Monthly rollups
│   └── 2024-01.md
│
└── .obsidian/
    └── graph.json               # Color-coding for graph view

Graph Connections

Three dimensions of wikilinks create a rich knowledge graph:

Folder links — [[University--2023]] — preserves your original Drive hierarchy
Topic links — [[Machine Learning]] — connects related docs across folders
Temporal links — [[2024-W03]] — shows when you worked on what

A document about ML in University/2023/ and another in Projects/ both link to [[Machine Learning]], connecting them even though they live in different folders.

Content Extraction

File Type	Method	Notes
Google Docs	Export as plain text	Full text via Drive API
Google Sheets	Export as CSV	Tabular data preserved
Google Slides	Export as plain text	Slide content extracted
PDFs	pypdf text extraction	Handles multi-page documents

Size-Tiered Extraction

Large files (e.g., university textbooks) are handled intelligently:

Small files (<1MB default): full text extraction
Large files (>1MB): first N pages only (shallow_max_pages in config)
Near-empty extraction (<50 chars, e.g., scanned PDFs): metadata only

The extraction field in each note's frontmatter records which mode was used (full, shallow, or metadata_only).

Topic Categorization

Priority stack (highest wins):

Keyword match — regex patterns from config/topics.yaml
LLM fallback — Claude Haiku classifies unmatched text (requires ANTHROPIC_API_KEY)
Default — "Uncategorized"

Documents can have multiple topics. When keyword and LLM disagree, keyword wins (user-defined intent).

Incremental Sync

drive-og sync only fetches documents modified since the last sync. State is tracked in .drive_og_state.json.

First run: fetches everything
Subsequent runs: only changed documents
--full: re-fetches everything regardless

The local cache (cache/) is the source of truth for vault generation. You can edit cache JSON files manually, then run drive-og generate to rebuild.

Project Structure

drive_og/
├── src/drive_og/
│   ├── cli.py                # CLI entry point (init, sync, generate)
│   ├── models.py             # GoogleDriveDocument dataclass
│   ├── config.py             # drives.yaml + topics.yaml loading
│   ├── auth.py               # OAuth 2.0 desktop flow
│   ├── gdrive_client.py      # Drive API: list files, resolve paths
│   └── content_extractor.py  # Size-tiered text extraction
├── config/
│   ├── drives.yaml.example   # Template for user configuration
│   └── topics.yaml.example   # Template for topic patterns
├── templates/
│   ├── gdrive_note.md.j2     # Per-document note template
│   └── folder_moc.md.j2      # Folder Map of Content template
├── tests/                     # 17 tests (unit + integration)
├── docs/superpowers/
│   ├── specs/                 # Design specification
│   └── plans/                 # Implementation plan
└── pyproject.toml

Dependencies

opsidian_core — shared library (cache, categorizer, vault writer, etc.)
google-api-python-client — Google Drive API
google-auth-oauthlib — OAuth 2.0 authentication
pypdf — PDF text extraction
anthropic — Claude API for LLM categorization (optional)

Ecosystem

drive_og is part of the opsidian knowledge graph ecosystem:

                    ┌─────────────────────┐
                    │   opsidian_core      │
                    │   (shared library)   │
                    └──┬────────┬────────┬─┘
                       │        │        │
              ┌────────┘        │        └────────┐
              v                 v                 v
     opsidian_graph         drive_og         opsidian_meta
     (work self)       (personal self)    (unified analysis)
     Git/JIRA/Confluence  Google Drive    reads both caches
              │                 │                 │
              v                 v                 v
     work vault/          personal vault/    meta vault/
                                            (timeline, focus
                                             reports, gaps)

Project	What it does
opsidian_core	Shared library for all graph generators
opsidian_graph	Work knowledge graph (GitHub PRs, JIRA, Confluence)
drive_og	Personal knowledge graph (Google Drive)
opsidian_meta	Unified productivity analysis (timeline, focus reports, gap detection)

After syncing with drive_og, you can run opsidian_meta to generate cross-domain productivity reports that combine your work and personal activity.

Tests

pip install -e ../opsidian_core
pip install -e .
python -m pytest tests/ -v

17 tests covering models, auth, config, client, extractor, CLI, and end-to-end integration.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
config		config
docs/superpowers		docs/superpowers
src/drive_og		src/drive_og
templates		templates
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

drive_og

What It Does

Quick Start

1. Set Up Google API Credentials

2. Configure

3. Install and Run

4. Open in Obsidian

Commands

Generated Vault Structure

Graph Connections

Content Extraction

Size-Tiered Extraction

Topic Categorization

Incremental Sync

Project Structure

Dependencies

Ecosystem

Tests

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

drive_og

What It Does

Quick Start

1. Set Up Google API Credentials

2. Configure

3. Install and Run

4. Open in Obsidian

Commands

Generated Vault Structure

Graph Connections

Content Extraction

Size-Tiered Extraction

Topic Categorization

Incremental Sync

Project Structure

Dependencies

Ecosystem

Tests

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages