Skip to content

feat: topic graph quality scoring and auto-prune pipeline#607

Open
scp7 wants to merge 4 commits into
always-further:mainfrom
scp7:feature/topic-scoring
Open

feat: topic graph quality scoring and auto-prune pipeline#607
scp7 wants to merge 4 commits into
always-further:mainfrom
scp7:feature/topic-scoring

Conversation

@scp7
Copy link
Copy Markdown
Contributor

@scp7 scp7 commented Feb 21, 2026

Summary

  • Add embedding-based topic graph quality scoring with three coherence metrics: global, parent, and sibling coherence
  • Implement 4-step cascading pruning pipeline that removes low-quality subtrees based on configurable thresholds
  • Wire scoring into the generate pipeline via a new topics.scoring YAML config section with prune: true/false control
  • Original graph is always preserved; pruned graph saved as a _scored derivative (Option B file strategy)
  • Add topic score and topic optimize-thresholds CLI commands for standalone scoring and threshold search
  • Add score report overlay support in topic inspect --score-report --show-pruned
  • Move sentence-transformers to [scoring] optional extra to avoid dependency conflicts

Changes

Area Files What
Core topic_quality.py (+702) Scoring engine, 4-step cascade, grid/random threshold optimizer, prune_graph()
Config config.py ScoringConfig Pydantic model with threshold validation
CLI cli.py topic score, topic optimize-thresholds commands, _score_and_prune_topic_model() pipeline helper
Tests test_topic_quality.py, test_config.py, test_topic_inspector.py 36 new tests (614 total pass)
Docs configuration.md, generate.md, topic-score.md, topic-optimize-thresholds.md Full config reference, pipeline docs, CLI reference
Deps pyproject.toml [scoring] optional extra for sentence-transformers

Test plan

  • All 614 tests pass (uv run pytest)
  • Lint clean (make lint)
  • Format clean (make format)
  • Verified scoring output matches reference implementation (1545 removed nodes identical, per-node metrics within 3e-7 float precision)
  • End-to-end: deepfabric generate config.yaml with scoring: section in YAML
  • End-to-end: deepfabric topic score graph.json standalone
  • End-to-end: deepfabric topic inspect graph.json --score-report report.json --show-pruned

Add GTD/LTD embedding-based quality metrics for topic graphs with
CLI commands for scoring, threshold optimization, and prune overlay
on inspect.
@scp7 scp7 added the experimental Experimental feature under active development label Feb 21, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @scp7, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces experimental features for evaluating and optimizing topic graph quality. It provides new CLI commands to score topic graphs based on embedding-based drift metrics (Global Topic Drift and Local Topic Drift) and to optimize the thresholds for these metrics. Additionally, the existing topic inspect command has been enhanced to visually overlay the results of these quality assessments, allowing users to preview which nodes would be flagged or pruned. These changes aim to improve the analytical capabilities for managing and refining topic structures.

Highlights

  • New topic score command: Introduced a command to evaluate topic graph quality using embedding-based Global Topic Drift (GTD) and Local Topic Drift (LTD) metrics, generating a JSON report with per-node scores and estimated pruning impact.
  • New topic optimize-thresholds command: Added a command to search for optimal GTD/LTD cutoff values through random or grid search, offering configurable ranges and constraints for fine-tuning topic quality.
  • Enhanced topic inspect command: Extended the topic inspect command with --score-report and --show-pruned options, allowing users to visually overlay flagged (yellow) and pruned (red) nodes directly onto the tree output, including a flagged-only mode.
  • New deepfabric/topic_quality.py module: Created a dedicated module to encapsulate the core logic for topic quality assessment, including cosine similarity scoring, BFS depth computation, descendant propagation, and threshold candidate generation.
  • New sentence-transformers dependency: Integrated sentence-transformers to automatically generate embeddings for nodes when they are missing from metadata, which is crucial for the new quality scoring features.
Changelog
  • deepfabric/cli.py
    • Added Any to typing imports.
    • Introduced --score-report and --show-pruned options to topic inspect.
    • Implemented logic to load prune overlay data and display it in topic inspect output.
    • Added _load_prune_overlay helper function to parse score reports.
    • Created _display_graph_overlay_tree function for rendering colored topic trees based on prune data.
    • Registered new topic score command with options for GTD/LTD thresholds, embedding key, and model.
    • Registered new topic optimize-thresholds command with options for search strategy, trials, threshold ranges, and constraints.
  • deepfabric/topic_quality.py
    • Introduced functions for safe cosine similarity calculation and BFS-based depth computation.
    • Implemented embedding extraction, filling missing embeddings using sentence-transformers, and descendant collection with caching.
    • Added utility functions for summarizing numeric values and deriving report paths.
    • Developed _build_topic_quality_context to precompute graph metrics.
    • Created _evaluate_thresholds to apply thresholds and calculate pruning impact.
    • Provided score_topic_graph for evaluating graph quality with GTD/LTD metrics.
    • Implemented _generate_threshold_candidates for random or grid search of thresholds.
    • Developed optimize_topic_thresholds to find optimal GTD/LTD thresholds with configurable constraints.
    • Added write_topic_score_report to save reports to disk.
  • docs/cli/topic-inspect.md
    • Updated documentation for topic inspect to include new --score-report and --show-pruned options.
    • Added a new 'Prune Overlay' section with usage examples and a legend for colored nodes.
  • docs/cli/topic-optimize-thresholds.md
    • Created new documentation for the topic optimize-thresholds command.
    • Provided usage instructions, example commands, and details on setting constraints.
  • docs/cli/topic-score.md
    • Created new documentation for the topic score command.
    • Detailed the GTD and LTD metrics, command options, and the structure of the generated report.
  • docs/cli/topic.md
    • Updated the main topic CLI documentation to list the new topic score and topic optimize-thresholds commands.
    • Revised the example workflow to include scoring graph quality.
  • mkdocs.yml
    • Updated MkDocs navigation configuration to include new documentation pages for topic score and topic optimize-thresholds.
  • pyproject.toml
    • Added sentence-transformers as a new project dependency.
  • tests/unit/test_topic_inspector.py
    • Added score_report_file fixture to create a mock score report.
    • Added graph_overlay_json_file fixture for testing flagged-only overlays.
    • Implemented test_inspect_graph_with_prune_overlay to check full overlay rendering.
    • Implemented test_inspect_graph_with_prune_overlay_flagged_only to verify filtered overlay display.
  • tests/unit/test_topic_quality.py
    • Created a new test file for deepfabric.topic_quality.
    • Added graph_with_embeddings_file fixture for testing.
    • Implemented test_score_topic_graph_flags_and_removals to verify scoring logic.
    • Added tests for deriving report paths and CLI report writing for topic score.
    • Implemented test_optimize_topic_thresholds_returns_best for optimization logic.
    • Added tests for deriving optimization report paths and CLI report writing for topic optimize-thresholds.
  • uv.lock
    • Updated the uv.lock file to reflect new and updated dependencies, including sentence-transformers, joblib, scikit-learn, scipy, and threadpoolctl.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant new features for topic graph quality analysis, including topic score and topic optimize-thresholds commands. The changes are well-structured, with a new topic_quality.py module containing the core logic, updates to the CLI, and corresponding documentation and tests. The implementation is generally robust, with good error handling and optimizations like memoization. I have one suggestion to improve the robustness of the embedding generation logic.

Comment thread deepfabric/topic_quality.py
scp7 added 3 commits February 25, 2026 13:57
Replace the 3-step GTD/LTD pipeline with a simplified 4-step cascade
using renamed metrics: global_coherence, parent_coherence, sibling_coherence.

Pipeline steps (each operates on surviving nodes from prior steps):
1. global_coherence < 0 (hardcoded gate)
2. parent_coherence < threshold (default 0.25)
3. sibling_coherence < lower threshold (default 0.20, outliers)
4. sibling_coherence > upper threshold (default 0.68, repetitive)

Changes:
- Add _compute_sibling_coherence_by_id() to topic_quality.py
- Rewrite _evaluate_thresholds() with 4-step cascade logic
- Update CLI flags for topic score and optimize-thresholds commands
- Add per-step removal breakdown to CLI score output
- Move sentence-transformers to optional [scoring] extra
- Update tests with 6-node fixture and new metric assertions
- Rewrite topic-score.md and topic-optimize-thresholds.md docs

Verified: identical output to research reference script on
seo-graph-10tools-single-5depth.json (3906 nodes, 1545 removed,
per-node metric diffs < 3e-7).
… strategy

Wire topic scoring into the generate pipeline via ScoringConfig in YAML.
When scoring.prune is true, the pruned graph is saved as a _scored.json
derived artifact while the original graph is preserved untouched. Score-only
mode (prune: false) reports metrics without modifying the graph. Includes
sibling coherence threshold validation, root node guard, and in-memory
graph path for score-only mode.
Document the topics.scoring YAML section in the configuration reference,
add a Topic Scoring and Pruning section to the generate command docs,
and cross-link from topic-score.md back to the config reference.
@scp7 scp7 changed the title feat: topic scoring and threshold optimization (experimental) feat: topic graph quality scoring and auto-prune pipeline Feb 26, 2026
@scp7
Copy link
Copy Markdown
Contributor Author

scp7 commented Feb 26, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature: topic graph quality scoring and an automated pruning pipeline. The changes are well-structured, with a new topic_quality.py module for the core logic, new CLI commands, and configuration options. The implementation preserves the original graph, which is a good design choice. I've identified a few areas for improvement, mainly around code duplication in the CLI and error handling in the new commands, which can enhance maintainability. Overall, this is a solid addition to the project.

Comment thread deepfabric/config.py
import warnings

from typing import Literal
from typing import Literal, Self
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Self type was added in Python 3.11. To maintain compatibility with Python 3.10 as specified in pyproject.toml, you should import Self from typing_extensions for versions older than 3.11.

Suggested change
from typing import Literal, Self
from typing import Literal
from typing_extensions import Self

Comment thread deepfabric/cli.py
Comment on lines +426 to +478
if scoring.prune:
graph, report = prune_graph(
topic_model,
parent_coherence=scoring.parent_coherence,
sibling_coherence_lower=scoring.sibling_coherence_lower,
sibling_coherence_upper=scoring.sibling_coherence_upper,
embedding_key=scoring.embedding_key,
embedding_model=scoring.embedding_model,
)

summary = report["summary"]
tui.success("Topic graph scored and pruned")
tui.console.print(f" Nodes: {summary['original_node_count']}")
tui.console.print(f" Flagged: {summary['flagged_node_count']}")
tui.console.print(f" Pruned: {summary['removed_node_count']}")
tui.console.print(f" Remaining: {summary['remaining_node_count']}")

# Save pruned graph as derived artifact, original stays untouched
p = Path(graph_save_path)
scored_path = str(p.with_stem(f"{p.stem}_scored"))
graph.save(scored_path)
tui.info(f"Pruned graph saved to {scored_path}")
tui.info(f"Original graph preserved at {graph_save_path}")

if scoring.save_report:
report_path = derive_topic_score_report_path(graph_save_path)
write_topic_score_report(report, report_path)
tui.info(f"Score report saved to {report_path}")

return graph

# score-only mode: generate report without pruning
report = score_topic_graph(
topic_model,
parent_coherence=scoring.parent_coherence,
sibling_coherence_lower=scoring.sibling_coherence_lower,
sibling_coherence_upper=scoring.sibling_coherence_upper,
embedding_key=scoring.embedding_key,
embedding_model=scoring.embedding_model,
)

summary = report["summary"]
tui.success("Topic graph scored (prune disabled)")
tui.console.print(f" Nodes: {summary['original_node_count']}")
tui.console.print(f" Flagged: {summary['flagged_node_count']}")
tui.console.print(f" Would remove: {summary['removed_node_count']}")

if scoring.save_report:
report_path = derive_topic_score_report_path(graph_save_path)
write_topic_score_report(report, report_path)
tui.info(f"Score report saved to {report_path}")

return topic_model
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function has a significant amount of duplicated code between the if scoring.prune: block and the else block for score-only mode. The arguments passed to prune_graph and score_topic_graph are identical, and the logic for saving the report is also repeated. This can be refactored to improve maintainability and reduce redundancy.

You can extract the common logic, such as preparing arguments and saving the report, to be executed outside the conditional blocks. The summary printing can also be consolidated.

    scoring_args = {
        "parent_coherence": scoring.parent_coherence,
        "sibling_coherence_lower": scoring.sibling_coherence_lower,
        "sibling_coherence_upper": scoring.sibling_coherence_upper,
        "embedding_key": scoring.embedding_key,
        "embedding_model": scoring.embedding_model,
    }

    if scoring.prune:
        graph, report = prune_graph(topic_model, **scoring_args)
        summary = report["summary"]
        tui.success("Topic graph scored and pruned")
        tui.console.print(f"  Nodes:         {summary['original_node_count']}")
        tui.console.print(f"  Flagged:       {summary['flagged_node_count']}")
        tui.console.print(f"  Pruned:        {summary['removed_node_count']}")
        tui.console.print(f"  Remaining:     {summary['remaining_node_count']}")

        # Save pruned graph as derived artifact, original stays untouched
        p = Path(graph_save_path)
        scored_path = str(p.with_stem(f"{p.stem}_scored"))
        graph.save(scored_path)
        tui.info(f"Pruned graph saved to {scored_path}")
        tui.info(f"Original graph preserved at {graph_save_path}")

        result_model = graph
    else:
        # score-only mode: generate report without pruning
        report = score_topic_graph(topic_model, **scoring_args)
        summary = report["summary"]
        tui.success("Topic graph scored (prune disabled)")
        tui.console.print(f"  Nodes:         {summary['original_node_count']}")
        tui.console.print(f"  Flagged:       {summary['flagged_node_count']}")
        tui.console.print(f"  Would remove:  {summary['removed_node_count']}")

        result_model = topic_model

    if scoring.save_report:
        report_path = derive_topic_score_report_path(graph_save_path)
        write_topic_score_report(report, report_path)
        tui.info(f"Score report saved to {report_path}")

    return result_model

Comment thread deepfabric/cli.py
Comment on lines +2527 to +2543
def subtree_has_highlight(node_id: int, stack: set[int] | None = None) -> bool:
if node_id in memo_has_highlight:
return memo_has_highlight[node_id]

local_stack = stack or set()
if node_id in local_stack:
return str(node_id) in highlighted_ids

local_stack.add(node_id)
has_self = str(node_id) in highlighted_ids
has_descendant = any(
subtree_has_highlight(child.id, local_stack) for child in graph.nodes[node_id].children
)
local_stack.remove(node_id)
result = has_self or has_descendant
memo_has_highlight[node_id] = result
return result
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This recursive function subtree_has_highlight for checking if a subtree contains a highlighted node can be simplified. The current implementation with local_stack to handle cycles is a bit complex for what it's doing. Since you're already using a memo_has_highlight cache, you can leverage it more effectively to prevent re-computation and simplify the cycle detection logic. A more direct check against the cache and the highlighted_ids set would be cleaner.

Suggested change
def subtree_has_highlight(node_id: int, stack: set[int] | None = None) -> bool:
if node_id in memo_has_highlight:
return memo_has_highlight[node_id]
local_stack = stack or set()
if node_id in local_stack:
return str(node_id) in highlighted_ids
local_stack.add(node_id)
has_self = str(node_id) in highlighted_ids
has_descendant = any(
subtree_has_highlight(child.id, local_stack) for child in graph.nodes[node_id].children
)
local_stack.remove(node_id)
result = has_self or has_descendant
memo_has_highlight[node_id] = result
return result
def subtree_has_highlight(node_id: int) -> bool:
if node_id in memo_has_highlight:
return memo_has_highlight[node_id]
# To prevent infinite recursion on cycles, assume false for now
memo_has_highlight[node_id] = False
has_self = str(node_id) in highlighted_ids
has_descendant = any(
subtree_has_highlight(child.id) for child in graph.nodes[node_id].children
)
result = has_self or has_descendant
memo_has_highlight[node_id] = result
return result

Comment thread deepfabric/cli.py
Comment on lines +2943 to +2954
except FileNotFoundError as e:
tui.error(str(e))
sys.exit(1)
except ValueError as e:
tui.error(str(e))
sys.exit(1)
except ConfigurationError as e:
tui.error(str(e))
sys.exit(1)
except Exception as e:
tui.error(f"Error scoring graph: {e}")
sys.exit(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The exception handling here is a bit broad. Catching a generic Exception can mask specific issues and make debugging harder. It's better to catch more specific exceptions that you expect, and let unexpected ones propagate or be handled by a more general error handler. In this case, you've already handled FileNotFoundError, ValueError, and ConfigurationError, so the generic Exception catch is likely for other unexpected issues. It would be better to log the full traceback in debug mode to aid in diagnosing these unexpected errors.

Comment thread deepfabric/cli.py
Comment on lines +3072 to +3077
except (FileNotFoundError, ValueError, ConfigurationError) as e:
tui.error(str(e))
sys.exit(1)
except Exception as e:
tui.error(f"Error optimizing thresholds: {e}")
sys.exit(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the topic_score command, the exception handling here is too broad. Catching a generic Exception can hide the root cause of problems. It's better to handle specific, expected exceptions and have a more general handler at a higher level if needed. For unexpected errors, logging the full traceback when in debug mode would be very helpful for developers.

texts = [graph.nodes[node_id].topic for node_id in missing_ids]
vectors = model.encode(texts, convert_to_numpy=True, normalize_embeddings=True)

for node_id, vector in zip(missing_ids, vectors, strict=False):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The zip function is used with strict=False. While this prevents a ValueError if the iterables have different lengths, it might hide potential bugs where missing_ids and vectors are expected to have the same length. If they are always expected to be the same length, using strict=True (the default in Python 3.10+) would be safer and make the code's intent clearer. If they can have different lengths, a comment explaining why would be helpful.

raise ValueError("trials must be greater than zero")

if search == "random":
rng = random.Random(seed) # noqa: S311
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using random.Random(seed) is a good practice for reproducibility. However, the comment noqa: S311 suggests awareness of a potential security issue (use of random instead of secrets for security-sensitive purposes). While this is not a security-critical context, it's worth noting that for cryptographic or security-related random number generation, the secrets module should be used. For this use case (reproducible random search), random is appropriate.

feasible = [e for e in evaluations if e["passes_constraints"]]
pool = feasible if feasible else evaluations
pool_sorted = sorted(pool, key=lambda e: e["objective"])
best = pool_sorted[0] if pool_sorted else None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. How are we sure pool_sorted[0] is the optimum? We could in fact tune one parameters up and the other one down in an unbalanced way and still achieve a value closest to the desired constraint. Maybe we can enforce a similar strength of pruning across the three thresholds? In other words, parent_coherence value needs to be similar to sibling_coherence_lower? And sibling_coherence_upper + sibling_coherence_lower should be close to 0.9?
    This makes sure the distribution of the three values are similar to default:
    • parent_coherence: 0.25
    • sibling_coherence_lower: 0.2
    • sibling_coherence_upper: 0.68

)
removed_ratio = (removed_count / total_nodes) if total_nodes else 0.0
internal_removed_ratio = (internal_removed / total_nodes) if total_nodes else 0.0
objective = removed_ratio + (1.5 * internal_removed_ratio)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to penalise it more when we remove nodes with lower depth? So gradually decreases penalisation as we approach nodes on the outside. To implement the new function, rather than just have one broad category called internal_removed_ratio, we can have removed_ratio for each depth, and penalise the removal of the lower depth more.

Comment thread deepfabric/cli.py

if prune_overlay and prune_overlay.get("thresholds"):
thresholds = prune_overlay["thresholds"]
tui.console.print(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the coherence score in topic inspect --show-pruned, to help user determine a threshold by inspecting the data. So basically add the coherence score for that node behind LOW_PARENT_COHERENCE.

Comment thread deepfabric/cli.py
graph_save_path = topics_save_as or config.topics.save_as or "topic_graph.json"

if scoring.prune:
graph, report = prune_graph(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enable another argument where the user can input id of nodes they wish to rescue from the filtering. For instance, the user might choose to keep some nodes, after manual inspection, but throw all other nodes that fall outside the threshold.

return combos[:trials]


def optimize_topic_thresholds(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add the score_report also to the output of optimize-thresholds so the user can use it to inspect graph. Currently, it only writes _threshold_optimization.json, which cannot be used as an input to graph inspection. So currently, to generate the score_report.json, the user has to copy the coherence score determined and manually use them to score graph again, which adds additional step.

Copy link
Copy Markdown

@Kexin-xu-01 Kexin-xu-01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @scp7 , I have tested the workflow on a few datasets and left some feedback. Overall it looks great! Thank you so much for integrating the functions into the workflow!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

experimental Experimental feature under active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants