diff --git a/README.md b/README.md index 510128e..1c62f9d 100644 --- a/README.md +++ b/README.md @@ -1,27 +1,57 @@ -

-

ChatGPT Export Tool

-

- Stream, analyze, and export your ChatGPT history — without loading it all into memory. -

-

- Python 3.10+ - MIT License - Ruff - uv -

-

+

-A Python CLI that takes your `conversations.json` export from ChatGPT and turns it into clean, readable **Markdown**, plain text, or structured JSON — with full control over what gets included. +chatgpt-export -It uses **streaming JSON parsing** ([ijson](https://github.com/ICRAR/ijson)) so even multi-hundred-megabyte exports never need to be loaded into memory. Filtering, formatting, and output are all modular and independently configurable through CLI flags or a single TOML config file. +
---- +**Your ChatGPT history. Clean Markdown. Zero memory overhead.** + +
+ +Python  +License  +Ruff  +uv + +

+ +[**Install**](#installation)  ·  +[**Quick Start**](#quick-start)  ·  +[**Formats**](#output-formats)  ·  +[**Filtering**](#filtering)  ·  +[**Config**](#configuration)  ·  +[**Fields Reference**](Fields.md) + +
+ +
+ +Takes your `conversations.json` from ChatGPT and turns it into **clean, readable Markdown** — or plain text, or structured JSON. Uses streaming parsing ([ijson](https://github.com/ICRAR/ijson)) so even 100MB+ exports never hit memory. Every aspect of filtering, formatting, and output is configurable through CLI flags or a single TOML file. + +
+ +
+ +``` +$ chatgpt-export export conversations.json --split subject --output-dir vault/ + + Exported 824 files to vault/ +``` + +
+ +
+ +> [!TIP] +> Export straight into your **Obsidian vault** for a fully searchable, linked archive of every conversation you've ever had with ChatGPT. + +
## Installation -> **Requires Python `3.10+`**  ·  Managed with [uv](https://docs.astral.sh/uv/) +> **Requires Python `3.10+`**  ·  managed with [uv](https://docs.astral.sh/uv/) ```bash git clone https://github.com/voidfreud/chatgpt-export-tool.git @@ -29,92 +59,57 @@ cd chatgpt-export-tool uv sync ``` -For dev tooling (pytest, ruff, coverage): +
+ Dev tooling (pytest, ruff, coverage) + +
```bash uv sync --group dev ``` -Verify the install: +
+ +Verify: ```bash uv run chatgpt-export --help ``` ---- +
## Quick Start ```bash -# Analyze your export — stats without loading the whole file +# Analyze — stats without loading the whole file uv run chatgpt-export analyze conversations.json -# Export everything to markdown (default) on stdout +# Export to markdown on stdout uv run chatgpt-export export conversations.json -# Export to a single file -uv run chatgpt-export export conversations.json --output all_chats.md - # One markdown file per conversation uv run chatgpt-export export conversations.json --split subject --output-dir exports/ -# JSON export +# JSON dump uv run chatgpt-export export conversations.json --format json --output dump.json ``` ---- - -## Commands - -### `analyze` - -Reports structure and statistics for a `conversations.json` file without writing any output files. - -| What it shows | Flag | -|---|---| -| Conversation & message counts, file size, date range | *(default)* | -| Field coverage per structural level | `--fields` | - -```bash -uv run chatgpt-export analyze data.json -uv run chatgpt-export analyze data.json --fields -uv run chatgpt-export analyze data.json --verbose --output analysis.txt -uv run chatgpt-export analyze data.json --debug -``` - -### `export` - -Converts conversations into **Markdown** (default), **plain text**, or **JSON** with fine-grained control over structure, metadata, and output layout. - -```bash -# Minimal readable export -uv run chatgpt-export export data.json --fields "groups minimal" - -# Full JSON, one file per conversation by date -uv run chatgpt-export export data.json --format json --split date --output-dir by-date/ - -# Selective metadata -uv run chatgpt-export export data.json --include "model*" --exclude plugin_ids - -# Use a config file for persistent defaults -cp chatgpt_export.toml.example chatgpt_export.toml -uv run chatgpt-export export data.json --config chatgpt_export.toml -``` - ---- +
## Output Formats -| Format | Flag | Extension | Description | -|---|---|---|---| -| **Markdown** | `--format md` | `.md` | Transcript-oriented with `#` headings, `>` blockquoted context, `---` separators. **Default.** | -| **Plain text** | `--format txt` | `.txt` | Indented text with plain labels — good for terminals and grep. | -| **JSON** | `--format json` | `.json` | Filtered conversation objects written as valid JSON. | +| Format | Flag | Extension | | +|:--|:--|:--|:--| +| **Markdown** | `--format md` | `.md` | `#` headings, `>` blockquoted context, `---` turn separators. **Default.** | +| **Plain text** | `--format txt` | `.txt` | Indented, plain labels — good for terminals and grep. | +| **JSON** | `--format json` | `.json` | Filtered conversation objects, valid JSON. | -Markdown and text exports follow the **active conversation branch** using `current_node` and `parent` links, so you see the conversation as it actually played out — not the full tree with all edits and branches. +Markdown and text exports follow the **active conversation branch** — you see the conversation as it played out, not the full tree with edits and branches.
-What the Markdown output looks like + Preview: Markdown output + +
```markdown # Opening bank account in Thailand @@ -129,6 +124,8 @@ Markdown and text exports follow the **active conversation branch** using `curre ## User [23:43 13-10-2025] Can a foreigner open a Thailand bank account? +--- + ## Assistant [23:44 13-10-2025] Yup — a foreigner *can* open a bank account in Thailand, but it's *much more difficult* now than it used to be... @@ -136,193 +133,205 @@ Yup — a foreigner *can* open a bank account in Thailand, but it's
-Default transcript behavior: +
-- **Shown:** user text, assistant text, assistant thoughts, user editable context (compact preview) -- **Hidden:** tool plumbing, assistant code, reasoning recap, blank nodes +**Default transcript policy:** -All of this is configurable via the `[transcript]` section in the TOML config. +| | Shown | Hidden | +|:--|:--|:--| +| | User text, assistant text, thoughts, context (compact) | Tool plumbing, code execution, reasoning recap, blank nodes | ---- +Fully configurable via `[transcript]` in the TOML config. -## Filtering +
-### Structural Fields — `--fields` +## Commands + +### `analyze` -Controls which parts of each conversation object are retained before formatting. +Structure and statistics — without writing output files. ```bash ---fields all # everything (default) ---fields none # structure only ---fields "include title,create_time,mapping" # keep only these ---fields "exclude moderation_results" # drop these ---fields "groups minimal" # use a named group +uv run chatgpt-export analyze data.json +uv run chatgpt-export analyze data.json --fields # include field coverage +uv run chatgpt-export analyze data.json --verbose --output analysis.txt +uv run chatgpt-export analyze data.json --debug ``` -**Built-in groups:** +### `export` -| Group | Fields | -|---|---| -| `conversation` | `_id`, `conversation_id`, `create_time`, `update_time`, `title`, `type` | -| `message` | `author`, `content`, `status`, `end_turn` | -| `metadata` | `model_slug`, `message_type`, `is_archived` | -| `minimal` | `title`, `create_time`, `message` | +Convert conversations with full control over structure, metadata, and layout. -See [`Fields.md`](Fields.md) for the full field-selection reference. +```bash +uv run chatgpt-export export data.json --fields "groups minimal" +uv run chatgpt-export export data.json --format json --split date --output-dir by-date/ +uv run chatgpt-export export data.json --include "model*" --exclude plugin_ids +uv run chatgpt-export export data.json --config chatgpt_export.toml +``` -### Metadata — `--include` / `--exclude` +
+ +## Filtering -Runs *after* structural filtering. Applies only to keys inside nested `message.metadata` dictionaries. +### Structural fields  `--fields` ```bash ---include model_slug # keep only model_slug ---include "model*" --exclude plugin_ids # glob patterns supported +--fields all # everything (default) +--fields none # structure only +--fields "include title,create_time,mapping" # whitelist +--fields "exclude moderation_results" # blacklist +--fields "groups minimal" # named group ``` -Known metadata names: `model_slug`, `message_type`, `plugin_ids`, `is_archived`. +
+ Built-in groups ---- +
-## Split Modes +| Group | Fields | +|:--|:--| +| `conversation` | `_id` `conversation_id` `create_time` `update_time` `title` `type` | +| `message` | `author` `content` `status` `end_turn` | +| `metadata` | `model_slug` `message_type` `is_archived` | +| `minimal` | `title` `create_time` `message` | -Control how conversations are distributed across output files. +
-| Mode | Flag | Behavior | -|---|---|---| -| **Single** | `--split single` | One combined stream or file *(default)* | -| **Subject** | `--split subject` | One file per conversation, named `Title_ID.md` | -| **Date** | `--split date` | Daily folders → one file per conversation | -| **ID** | `--split id` | One file per conversation, named by conversation ID | +Full reference: [`Fields.md`](Fields.md) -```bash -# Stdout (single, no --output) -uv run chatgpt-export export data.json +### Metadata  `--include` / `--exclude` -# Single file -uv run chatgpt-export export data.json --output all.md +Runs after structural filtering. Applies to keys inside `message.metadata`. -# Split into a directory -uv run chatgpt-export export data.json --split subject --output-dir exports/ +```bash +--include model_slug +--include "model*" --exclude plugin_ids # glob patterns ``` -> **Note:** `--output` is for single mode only. Split modes use `--output-dir`. +
---- +## Split Modes + +| Mode | Flag | Output | +|:--|:--|:--| +| **single** | `--split single` | One stream or file *(default)* | +| **subject** | `--split subject` | `Title_ID.md` per conversation | +| **date** | `--split date` | Daily folders | +| **id** | `--split id` | Named by conversation ID | + +> [!NOTE] +> `--output` is for single mode. Split modes use `--output-dir`. + +
## Configuration -All export behavior can be persisted in a single TOML file. The repo ships [`chatgpt_export.toml.example`](chatgpt_export.toml.example) as a starting point. +Persist defaults in a single TOML file. The repo ships [`chatgpt_export.toml.example`](chatgpt_export.toml.example). ```bash cp chatgpt_export.toml.example chatgpt_export.toml -# edit to taste, then: uv run chatgpt-export export data.json --config chatgpt_export.toml ``` CLI flags always override TOML values.
-TOML sections overview - -**`[defaults]`** — format, split mode, field selection, output directory, metadata filters - -**`[transcript]`** — branch following, visibility rules per content type - -| Key | Default | What it does | -|---|---|---| -| `show_system_messages` | `false` | Include system prompts | -| `show_tool_messages` | `false` | Include tool/function calls | -| `show_assistant_code` | `false` | Include code execution blocks | -| `show_reasoning_recap` | `false` | Include reasoning summaries | -| `user_editable_context_mode` | `"compact"` | `"compact"` or `"full"` for context rendering | -| `include_content_types` | `[]` | Whitelist specific content types | -| `exclude_content_types` | `[]` | Blacklist specific content types | - -**`[text_output]`** — header, layout, formatting for text/markdown output - -| Key | Default | What it does | -|---|---|---| -| `layout_mode` | `"reading"` | `"reading"` (spacious) or `"compact"` (dense) | -| `heading_style` | `"markdown"` | `"markdown"` (with `#`) or `"plain"` | -| `turn_separator` | `"---"` | Separator between turns | -| `strip_chatgpt_artifacts` | `true` | Remove ChatGPT citation/nav artifacts | -| `wrap_width` | `88` | Line wrap width (`0` to disable) | -| `include_turn_count_in_header` | `true` | Show turn count in header | -| `include_turn_numbers` | `false` | Number each turn | + TOML reference + +
+ +**`[defaults]`** — format, split, fields, output directory, metadata + +**`[transcript]`** — branch following and visibility + +| Key | Default | | +|:--|:--|:--| +| `show_system_messages` | `false` | System prompts | +| `show_tool_messages` | `false` | Tool/function calls | +| `show_assistant_code` | `false` | Code execution | +| `show_reasoning_recap` | `false` | Reasoning summaries | +| `user_editable_context_mode` | `"compact"` | `"compact"` or `"full"` | + +**`[text_output]`** — layout and formatting + +| Key | Default | | +|:--|:--|:--| +| `layout_mode` | `"reading"` | `"reading"` or `"compact"` | +| `heading_style` | `"markdown"` | `"markdown"` or `"plain"` | +| `turn_separator` | `"---"` | Between turns | +| `strip_chatgpt_artifacts` | `true` | Remove citation artifacts | +| `wrap_width` | `88` | `0` to disable |
-Config presets + Presets + +
**Reading-first** *(default)* ```toml [text_output] layout_mode = "reading" heading_style = "markdown" -turn_separator = "---" strip_chatgpt_artifacts = true wrap_width = 88 ``` -**Compact scanning** +**Compact** ```toml [text_output] layout_mode = "compact" -include_turn_count_in_header = false turn_separator = "" wrap_width = 0 ``` -**Plain text / terminal** +**Terminal** ```toml [defaults] format = "txt" [text_output] -layout_mode = "reading" heading_style = "plain" -turn_separator = "---" ```
---- +
## Architecture ``` chatgpt_export_tool/ -├── cli.py ← Entry point & argparse -├── commands/ ← analyze, export command wiring +├── cli.py ← entry point +├── commands/ ← analyze, export └── core/ - ├── parser.py ← Streaming JSON via ijson - ├── filter_pipeline.py ← Field + metadata filtering - ├── export_service.py ← Orchestration - ├── config/ ← TOML loading, models, validation - ├── transcript/ ← Branch reconstruction, text extraction - ├── validation/ ← Field & metadata validation - └── output/ ← Formatters, writer, path resolution + ├── parser.py ← streaming JSON (ijson) + ├── filter_pipeline.py ← field + metadata filtering + ├── export_service.py ← orchestration + ├── config/ ← TOML loading & validation + ├── transcript/ ← branch reconstruction + ├── validation/ ← field & metadata checks + └── output/ ← formatters, writer, paths ``` -The design is deliberately modular: filtering, formatting, splitting, and writing are separate concerns. Most changes touch one small file, not a central controller. +Modular by design — filtering, formatting, splitting, and writing are separate concerns. ---- +
## Development ```bash -# Tests uv run pytest uv run pytest --cov=chatgpt_export_tool --cov-report=term-missing - -# Lint & format uv run ruff check chatgpt_export_tool tests uv run ruff format --check chatgpt_export_tool tests ``` ---- +
+ +
-## License +[MIT](LICENSE)  ·  built by [@voidfreud](https://github.com/voidfreud) -[MIT](LICENSE) — Void Freud ([@voidfreud](https://github.com/voidfreud)) +