Skip to content

fix(normalize): fail closed on invalid ChatGPT mapping trees#329

Open
GaosCode wants to merge 1 commit intoMemPalace:developfrom
GaosCode:fix/chatgpt-mapping-active-branch
Open

fix(normalize): fail closed on invalid ChatGPT mapping trees#329
GaosCode wants to merge 1 commit intoMemPalace:developfrom
GaosCode:fix/chatgpt-mapping-active-branch

Conversation

@GaosCode
Copy link
Copy Markdown

@GaosCode GaosCode commented Apr 9, 2026

Closes #330

What does this PR do?

Fix ChatGPT mapping normalization so imports fail closed instead of silently ingesting the wrong transcript or raw invalid JSON.

This PR:

  • stops walking children[0] and resolves the transcript from the active node path instead
  • uses current_node when present, and rejects ambiguous multi-branch trees without it
  • rejects invalid ChatGPT exports when mapping is malformed, current_node is invalid, the resolved path does not connect back to the detected root, or the path is not reachable from the root via children
  • prevents mine_convos() from silently passing invalid ChatGPT exports downstream as plain text
  • updates ingest reporting so skipped invalid ChatGPT exports are counted and labeled correctly
  • adds regression tests for regenerated branches, edited branches, invalid current_node, orphaned subtrees, unreachable hidden nodes, and ingest skip reporting

How to test

uv run pytest /Users/mrbrain/code/mempalace/tests/test_normalize.py -q
uv run pytest /Users/mrbrain/code/mempalace/tests/test_convo_miner.py -q -k "ambiguous or invalid or summary"
uv run ruff check .

Checklist

Tests pass (python -m pytest tests/ -v)
No hardcoded paths
Linter passes (ruff check .)

Copy link
Copy Markdown

@web3guru888 web3guru888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix — the old children[0] walk was a silent data corruption vector and this is the right way to handle it.

What I like:

  1. Fail closed on invalid data is the correct design choice. The previous behavior (return None, let mine_convos() silently pass the raw JSON downstream as plain text) meant corrupted memories could enter the palace without any signal. Raising ChatGPTNormalizeError and counting skips in the summary output makes failures visible.

  2. Path resolution from current_node back to root is the correct algorithm — it follows the active branch exactly as ChatGPT intended, rather than guessing via children[0]. The reachability check (path must be reachable from root via children edges) catches both orphan subtrees and nodes that only have parent pointers without being listed as children.

  3. Exception hierarchy (ChatGPTNormalizeErrorChatGPTBranchAmbiguityError) is clean — callers can catch broadly or narrowly.

  4. Test coverage is thorough: regenerated branches, edited branches, invalid current_node, orphans, unreachable hidden nodes, too-few-messages, and ingest reporting. The multi-turn edit branch test (test_chatgpt_mapping_uses_active_edit_branch_path) is a particularly good scenario.

Minor observation (not blocking):

_collect_chatgpt_reachable_ids gets called twice for multi-branch trees without current_node — once inside _collect_chatgpt_leaf_ids and once in the main _try_chatgpt_json before _build_chatgpt_path. Could cache the result, but for typical conversation sizes this is negligible.

@GaosCode
Copy link
Copy Markdown
Author

Rebased/merged latest main and resolved conflicts. Targeted tests still pass.

@GaosCode GaosCode force-pushed the fix/chatgpt-mapping-active-branch branch from 83d209b to ad9e46f Compare April 11, 2026 06:56
@bensig bensig changed the base branch from main to develop April 11, 2026 22:22
@bensig bensig requested a review from igorls as a code owner April 11, 2026 22:22
@GaosCode GaosCode force-pushed the fix/chatgpt-mapping-active-branch branch from ad9e46f to acf0923 Compare April 13, 2026 02:58
@GaosCode
Copy link
Copy Markdown
Author

Hi @bensig, quick follow-up on this PR.

I updated it on top of the latest develop, resolved the merge conflicts, and re-ran the targeted tests locally. It should be ready for review now when you have time.

Thanks!

@igorls igorls added area/mining File and conversation mining bug Something isn't working labels Apr 14, 2026
@GaosCode GaosCode force-pushed the fix/chatgpt-mapping-active-branch branch from acf0923 to 309b1ca Compare April 15, 2026 12:40
@GaosCode
Copy link
Copy Markdown
Author

Hi @igorls, quick follow-up on this PR.

I’ve updated the branch, resolved the latest merge conflicts, and re-ran the targeted tests locally. Could you please take a look when you have time?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/mining File and conversation mining bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ChatGPT mapping imports can silently ingest the wrong branch

3 participants