Skip to content

seungbinshin/opsidian-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

opsidian_core

Shared core library for Obsidian knowledge graph generators. Provides generic caching, topic categorization, template rendering, vault writing, and sync state management that works with any data source.

Overview

opsidian_core is the foundation layer shared by:

  • opsidian_graph — work knowledge graph from GitHub PRs, JIRA issues, and Confluence pages
  • drive_og — personal knowledge graph from Google Drive documents
  • opsidian_meta — unified productivity analysis vault that reads both caches and generates timeline views, focus reports, and knowledge-gap detection

All projects produce Obsidian vaults with interconnected markdown notes, wikilinks, and topic-based organization. This library provides the common machinery they share.

Architecture

                    opsidian_core (shared library)
                ┌──────────────────────────────────┐
                │  models.py      BaseDocument      │
                │  cache.py       JSON persistence   │
                │  categorizer.py keyword + LLM      │
                │  synthesizer.py weekly summaries   │
                │  note_generator.py Jinja2 engine   │
                │  vault_writer.py markdown output   │
                │  state_manager.py incremental sync │
                │  config.py      topic YAML loading │
                │  templates/     shared .j2 files   │
                └──────────┬───────────┬────────────┘
                           │           │
              ┌────────────┘           └────────────┐
              v                                     v
    opsidian_graph                             drive_og
    (GitHub/JIRA/Confluence)              (Google Drive)

Installation

pip install -e .

Key Concepts

BaseDocument Protocol

All data sources must produce objects compatible with BaseDocument. This uses duck-typing — no inheritance required. Any dataclass with these fields works:

@dataclass
class BaseDocument:
    id: str              # Unique identifier
    title: str           # Document title
    body_text: str       # Extracted text content
    source_type: str     # "github", "jira", "confluence", "gdrive", "desktop"
    source_group: str    # Grouping key (repo name, folder path, project key)
    created_at: str      # ISO 8601 timestamp
    updated_at: str      # ISO 8601 timestamp
    url: str             # Web URL to original document
    labels: list[str]    # Tags/labels from the source

    @property
    def date(self) -> str:  # YYYY-MM-DD from updated_at

To check compatibility at runtime:

from opsidian_core import is_base_document

assert is_base_document(my_custom_doc)  # True if all fields present

Topic Categorization

Documents are categorized into topics using a priority stack:

  1. Keyword matching — regex patterns from config/topics.yaml
  2. LLM fallback — Claude Haiku classifies unmatched documents (requires ANTHROPIC_API_KEY)
  3. Default"Uncategorized" if both fail

Documents can have multiple topics (multi-label).

Caching

Generic JSON cache that serializes any dataclass via dataclasses.asdict():

from opsidian_core import save_all_items, load_items

save_all_items(cache_dir, "_gdrive", docs)   # Write to cache/_gdrive/<group>/<id>.json
items = load_items(cache_dir, "_gdrive")      # Returns list[dict]

API Reference

Models

  • BaseDocument — common document dataclass
  • is_base_document(obj) — duck-typing check

Cache

  • save_item(cache_dir, source_prefix, group, item) — save one item
  • save_all_items(cache_dir, source_prefix, items) — save list, grouped by source_group
  • load_items(cache_dir, source_prefix) — load all items as dicts
  • delete_item(cache_dir, source_prefix, group, item_id) — remove one item

Categorizer

  • TopicRule(name, patterns) — topic definition with compiled regex
  • extract_topics_from_text(text, rules) — keyword matching only
  • categorize_document(doc, rules, *, use_llm=True) — full pipeline (keywords + LLM + default)

Config

  • load_topics(config_dir) — parse config/topics.yaml into list[TopicRule]

Note Generator

  • sanitize_filename(name) — clean string for use as filename
  • week_id(date_str)"2024-01-15""2024-W03"
  • month_id(date_str)"2024-01-15""2024-01"
  • create_template_env(*template_dirs) — Jinja2 environment with ChoiceLoader

State Manager

  • SyncState(last_sync, sources) — incremental sync state
  • load_state(path) / save_state(path, state) — JSON persistence

Synthesizer

  • generate_weekly_summary(week_id, items) — LLM-powered summary (returns None if no API key)

Vault Writer

  • write_note(vault_path, rel_path, content) — write single markdown note
  • write_all_notes(vault_path, notes) — batch write, returns count

Shared Templates

Located in templates/:

Template Purpose
topic_moc.md.j2 Cross-source topic Map of Content
weekly_note.md.j2 Weekly activity summary
monthly_note.md.j2 Monthly rollup
expertise.md.j2 Auto-generated expertise profile
index.md.j2 Dashboard / index page

Configuration

config/topics.yaml

topics:
  Machine Learning:
    - "\\bml\\b"
    - "\\bdeep.learning"
    - "\\bneural.network"
  CI/CD:
    - "\\bci\\b"
    - "\\bworkflow\\b"

Patterns are case-insensitive regex. Each topic can have multiple patterns.

Adding a New Data Source

To create a new project that uses opsidian_core:

  1. Define a dataclass with BaseDocument-compatible fields (use properties for mapping)
  2. Write a client that fetches data from your source
  3. Use save_all_items / load_items for caching
  4. Use categorize_document for topic extraction
  5. Create source-specific Jinja2 templates
  6. Use create_template_env to merge core + source templates
  7. Use write_all_notes to output the vault

See drive_og for a complete example.

Tests

python -m pytest tests/ -v

19 tests covering all modules.

License

MIT

About

Shared core library for Obsidian knowledge graph generators

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors