-
Notifications
You must be signed in to change notification settings - Fork 55
Add text, markdown, and chat output formats (#43) #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implement multiple text-based output formats for Claude Code transcripts, providing alternatives to HTML for documentation and terminal viewing. New Features: - Text format: Verbose output with timestamps, token usage, and full details - Markdown format: Same as text with markdown heading hierarchy - Chat format: Compact conversation flow mimicking Claude Code UI - Uses symbols: > for user, ⏺ for assistant/tools, ⎿ for results - Truncates long outputs at 10 lines with "… +N lines" indicator Architecture: - Created content_extractor.py for shared content parsing logic - Eliminates duplication between HTML and text rendering pipelines - Both renderer.py and text_renderer.py use common extraction layer CLI: - Added --format option: html (default), text, markdown, chat - Examples: - claude-code-log dir/ --format text -o output.txt - claude-code-log dir/ --format markdown -o output.md - claude-code-log file.jsonl --format chat 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
WalkthroughAdds multi-format output support (HTML, text, markdown, chat), new content extraction and text-rendering modules, CLI Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI (main)
participant Converter as convert_jsonl_to_output
participant TextRend as text_renderer
participant HTMLConv as convert_jsonl_to_html
participant Extractor as content_extractor
participant Renderer as renderer.py
CLI->>+Converter: call convert_jsonl_to_output(input, format)
alt format == "html"
Converter->>+HTMLConv: convert_jsonl_to_html(...)
HTMLConv-->>-Converter: html_files/paths
else format in ["text","markdown","chat"]
Converter->>Converter: load transcripts, filter dates, init cache
alt format in ["text","markdown"]
Converter->>+TextRend: generate_text / generate_markdown(messages)
else format == "chat"
Converter->>+TextRend: generate_chat(messages)
end
TextRend->>+Extractor: extract_content_data(content) for items
Extractor-->>-TextRend: ExtractedContent
TextRend-->>-Converter: rendered_output
Converter->>Converter: write output file(s)
end
Converter-->>-CLI: return output_path
note over Renderer,Extractor: renderer.py now calls extract_content_data\nand delegates to existing formatters
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Areas to focus on:
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (10)
CLAUDE.md (1)
142-142: Mention chat support in text_renderer description
claude_code_log/text_renderer.pynow also generates the compact chat format, not just plain text and markdown. Consider updating the bullet to reflect that broader responsibility.-- `claude_code_log/text_renderer.py` - Plain text and markdown rendering +- `claude_code_log/text_renderer.py` - Plain text, markdown, and chat renderingREADME.md (1)
140-170: Align section heading and file-structure note with chat supportThe new section already documents chat usage, but the heading and text_renderer description still read as text/markdown-only:
- Consider renaming the heading to include chat:
-### Text and Markdown Output +### Text, Markdown, and Chat Output
- Likewise, update the file-structure bullet so it matches the module’s current role:
-- `claude_code_log/text_renderer.py` - Plain text and markdown rendering +- `claude_code_log/text_renderer.py` - Plain text, markdown, and chat renderingAlso applies to: 175-175
claude_code_log/content_extractor.py (2)
73-132: Consider a safer fallback for unknown content types
extract_content_datareturnsNonefor anyContentItemwhose type isn’t one of the known variants, which meansrender_message_content/chat rendering will silently drop those blocks. That’s fine for now, but if Anthropic adds new content types, they’ll disappear from non-HTML outputs.If you prefer graceful degradation, you could treat unknown items as generic text instead of skipping them:
- # Unknown content type - return None + # Unknown content type – fall back to string representation so it isn't silently dropped + return ExtractedText(text=str(content))This keeps future content visible (even if not specially formatted) while still using structured handling for known types.
134-145: Match HTML rendering’s Unicode handling in tool JSON formatting
format_tool_input_jsoncurrently uses the defaultjson.dumpsbehavior, which escapes non‑ASCII characters. In HTML rendering,render_params_tableusesensure_ascii=Falseso Unicode is preserved.For consistent, readable tool input across HTML and text outputs, consider:
-def format_tool_input_json(tool_input: Dict[str, Any], indent: int = 2) -> str: +def format_tool_input_json(tool_input: Dict[str, Any], indent: int = 2) -> str: @@ - return json.dumps(tool_input, indent=indent) + return json.dumps(tool_input, indent=indent, ensure_ascii=False)claude_code_log/converter.py (1)
35-36: Tighten convert_jsonl_to_output docstring and text/summary behaviorThe orchestration looks solid (cache usage, directory vs. file behavior, and title/date handling all mirror the HTML path), but two small points:
- Docstring out of date for chat
The
output_formatparameter doc omits"chat", even though the function accepts it:- output_format: Output format - "html", "text", or "markdown" + output_format: Output format - "html", "text", "markdown", or "chat"
- Text output doesn’t include summaries, unlike markdown
generate_markdownwrapsgenerate_text(..., format_type="markdown", include_summaries=True), but the text branch calls:content = generate_text(messages, title, format_type="text")leaving
include_summaries=False. If you want text to match the README’s description (“Session headers with IDs and summaries (text/markdown only)”), consider:- else: - content = generate_text(messages, title, format_type="text") + else: + content = generate_text( + messages, + title, + format_type="text", + include_summaries=True, + )Otherwise, it’d be good to clarify in docs that only markdown includes summaries by default.
Also applies to: 38-163
claude_code_log/cli.py (2)
345-354: Keep CLI help/docstring in sync with supported formatsThe new
--formatoption correctly advertiseshtml,text,markdown, andchat, but themaindocstring still omits chat:- """Convert Claude transcript JSONL files to HTML, text, or markdown. + """Convert Claude transcript JSONL files to HTML, text, markdown, or chat.The
--outputand--formathelp texts otherwise look consistent with the converter behavior.Also applies to: 414-414
599-607: Avoid duplicate success messages and gate open-browser to HTMLTwo small UX nits around the final conversion/launch logic:
- Duplicate “success” messages for non-HTML formats
convert_jsonl_to_outputalready prints a success message whensilent=False, andmainprints another one right after:output_path = convert_jsonl_to_output(...) if input_path.is_file(): click.echo(f"Successfully converted {input_path} to {output_path}")For text/markdown/chat, this results in duplicate lines. One easy fix is to suppress converter logging when called from the CLI:
- output_path = convert_jsonl_to_output( + output_path = convert_jsonl_to_output( input_path, output, from_date, to_date, output_format, not no_individual_sessions, not no_cache, - ) + silent=True, + )You then keep the existing
click.echo(...)messages as the single source of user-facing output.
- Align open-browser behavior with its “HTML only” warning
You already warn when
--open-browseris used with non-HTML formats:if output_format.lower() != "html" and open_browser: click.echo("Warning: --open-browser only works with HTML format", err=True)But the final block still launches the file unconditionally:
if open_browser: click.launch(str(output_path))To make the warning truthful and avoid opening
.txt/.mdfiles unexpectedly, gate this:- if open_browser: - click.launch(str(output_path)) + if open_browser and output_format.lower() == "html": + click.launch(str(output_path))Both changes are small but make the CLI behavior more predictable for users.
Also applies to: 612-624
test/test_text_rendering.py (1)
61-67: Consider extracting helpers for JSONL/temp-file setup in testsThere’s a fair bit of repeated boilerplate for creating temporary JSONL transcript files and cleaning them up across the tests. Pulling this into a small helper (e.g.,
write_messages_to_jsonl(messages) -> Path) would reduce duplication and make individual tests focus purely on assertions.Also applies to: 133-138, 189-193, 307-313, 369-374, 434-438, 514-519
claude_code_log/text_renderer.py (2)
249-305: Summary rendering can be duplicated vs session headerThe intent in the comment on Line 302 is to “Skip summaries if not including them or if we already showed it in session header”, but with the current logic
SummaryTranscriptEntryobjects will still be rendered even when a matching summary has already been included in the session header:
- Session summaries for a
leafUuidare added tosession_summariesand displayed in the session header when the first message with thatsessionIdis seen.SummaryTranscriptEntryinstances don’t havesessionId, sosession_startedisFalsewhen they’re processed.- As a result, the condition
if include_summaries and not session_started:is true, and the summary is rendered again viarender_summary.If you want to avoid duplicates and only render standalone summaries when they can’t be mapped to a session, you could gate on the
leafUuidlookup instead ofsession_started, e.g.:- elif isinstance(message, SummaryTranscriptEntry): - # Skip summaries if not including them or if we already showed it in session header - if include_summaries and not session_started: - lines.append(render_summary(message, format_type)) + elif isinstance(message, SummaryTranscriptEntry): + # Only render summaries that are not already shown in a session header + if include_summaries: + leaf_uuid = message.leafUuid + session_id = uuid_to_session.get(leaf_uuid) + # If we can't map this summary to a session (or it wasn't added to + # the session header), fall back to rendering it inline. + if not session_id or session_id not in session_summaries: + lines.append(render_summary(message, format_type))This keeps the session-header summaries as the primary representation when available, but still shows orphaned summaries.
350-373: Chat tool-result handling ignores structured (non-string) contentIn
generate_chat, the user-side tool-result path only renders results whenextracted.contentis astr. If the tool result content is a structured list (e.g., list of dicts as supported inrender_text_content),has_tool_resultbecomesTruebut nothing is actually rendered aside from a blank line. That can make some tool results silently disappear in chat format.If you want chat mode to remain concise but still show something for structured content, consider converting non-string tool results into a readable string before truncation, e.g.:
- if isinstance(extracted, ExtractedToolResult): - has_tool_result = True - # Show tool result with truncated output - if isinstance(extracted.content, str): - truncated = _truncate_lines(extracted.content, 10) + if isinstance(extracted, ExtractedToolResult): + has_tool_result = True + # Show tool result with truncated output + if isinstance(extracted.content, str): + content_str = extracted.content + elif isinstance(extracted.content, list): + # Compact representation for structured content + parts = [] + for item in extracted.content: + if isinstance(item, dict): + parts.append(json.dumps(item, separators=(",", ":"))) + else: + parts.append(str(item)) + content_str = "\n".join(parts) + else: + content_str = str(extracted.content) + + if content_str: + truncated = _truncate_lines(content_str, 10) # Indent each line of the result indented_lines: List[str] = [] for line in truncated.split("\n"): indented_lines.append(f" {line}") - lines.append(f" ⎿ {indented_lines[0]}") - lines.extend(indented_lines[1:]) - lines.append("") + lines.append(f" ⎿ {indented_lines[0]}") + lines.extend(indented_lines[1:]) + lines.append("")This preserves the existing behavior for plain-text tool outputs while giving a minimal but informative view for structured results.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CLAUDE.md(2 hunks)README.md(2 hunks)claude_code_log/cli.py(6 hunks)claude_code_log/content_extractor.py(1 hunks)claude_code_log/converter.py(1 hunks)claude_code_log/renderer.py(2 hunks)claude_code_log/text_renderer.py(1 hunks)test/test_text_rendering.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
claude_code_log/converter.py (5)
claude_code_log/text_renderer.py (3)
generate_text(211-315)generate_markdown(318-324)generate_chat(338-426)claude_code_log/cache.py (2)
get_library_version(468-511)CacheManager(69-465)claude_code_log/parser.py (3)
load_transcript(117-202)load_directory_transcripts(205-231)filter_messages_by_date(60-114)claude_code_log/utils.py (1)
extract_working_directories(119-151)claude_code_log/renderer.py (1)
get_project_display_name(78-103)
test/test_text_rendering.py (3)
claude_code_log/parser.py (1)
load_transcript(117-202)claude_code_log/text_renderer.py (5)
generate_text(211-315)generate_markdown(318-324)render_text_content(46-108)format_usage_info(28-43)generate_chat(338-426)claude_code_log/models.py (3)
TextContent(60-62)ToolUseContent(65-69)UsageInfo(22-57)
claude_code_log/renderer.py (2)
claude_code_log/content_extractor.py (5)
extract_content_data(73-131)ExtractedText(23-26)ExtractedThinking(30-34)ExtractedToolUse(38-43)ExtractedToolResult(47-52)claude_code_log/models.py (4)
ToolUseContent(65-69)ToolResultContent(72-76)ThinkingContent(79-82)ImageContent(91-93)
claude_code_log/text_renderer.py (4)
claude_code_log/models.py (5)
AssistantTranscriptEntry(207-210)UserTranscriptEntry(201-204)SummaryTranscriptEntry(213-217)SystemTranscriptEntry(220-225)UsageInfo(22-57)claude_code_log/parser.py (1)
extract_text_content(24-49)claude_code_log/renderer.py (1)
format_timestamp(149-168)claude_code_log/content_extractor.py (6)
extract_content_data(73-131)ExtractedText(23-26)ExtractedThinking(30-34)ExtractedToolUse(38-43)ExtractedToolResult(47-52)format_tool_input_json(134-144)
claude_code_log/content_extractor.py (1)
claude_code_log/models.py (5)
TextContent(60-62)ToolUseContent(65-69)ToolResultContent(72-76)ThinkingContent(79-82)ImageContent(91-93)
claude_code_log/cli.py (1)
claude_code_log/converter.py (3)
convert_jsonl_to_html(166-253)convert_jsonl_to_output(38-163)process_projects_hierarchy(661-855)
🪛 LanguageTool
README.md
[uncategorized] ~142-~142: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...t Convert transcripts to plain text or markdown format for documentation or terminal vi...
(MARKDOWN_NNP)
[uncategorized] ~161-~161: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...s - markdown: Same as text but with markdown heading hierarchy for better document i...
(MARKDOWN_NNP)
CLAUDE.md
[uncategorized] ~135-~135: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...ories - markdown: Same as text with markdown heading hierarchy - chat: Compact c...
(MARKDOWN_NNP)
🔇 Additional comments (5)
CLAUDE.md (2)
3-12: New multi-format overview reads wellThe updated intro and feature bullets clearly describe HTML, text, markdown, and chat output and match the new CLI options; no changes needed here.
118-137: Nice dedicated section for non‑HTML formatsThe examples and format comparison for text/markdown/chat are clear and aligned with CLI usage; this section looks good as-is.
claude_code_log/renderer.py (1)
32-38: Content-extractor integration looks correct and keeps HTML behaviorThe new
render_message_contentimplementation cleanly reusesextract_content_dataand preserves the old distinctions:
- User text → escaped
<pre>blocks.- Assistant text → markdown via
render_markdown.- Tool use/result/thinking → reconstructed
ToolUseContent/ToolResultContent/ThinkingContentand delegated to existing formatters.- Images still rendered through the existing
ImageContentbranch.Given that
text_only_contentis already stripped of tool/thinking/image items before this function is called, the extra branches forExtractedToolUse/ExtractedToolResult/ExtractedThinkingmainly future‑proof the code and don’t change current behavior. Overall, the refactor looks safe and improves reuse.Also applies to: 1312-1377
test/test_text_rendering.py (1)
17-576: Comprehensive coverage of new text/markdown/chat behaviorsThe test suite here does a good job exercising all the new text APIs: basic text/markdown output, tool use rendering, usage formatting, content rendering, session summaries, and the different chat-format behaviors (symbols, truncation, indentation). The tests are also wired through
load_transcript, which is a nice integration-level check rather than only unit-level calls.claude_code_log/text_renderer.py (1)
28-325: Solid, cohesive text/markdown rendering layerThe overall design here is clear and cohesive: content extraction is centralized in
render_text_content, headers/body are well-separated for user/assistant/system/summary, andgenerate_text/generate_markdownprovide a straightforward API with session-aware grouping and optional summaries/system messages. The usage formatting and tool-use/result rendering also align well with the tests.
| try: | ||
| # Handle TUI mode | ||
| # Validate incompatible options | ||
| if output_format.lower() != "html" and tui: | ||
| click.echo("Error: TUI mode only works with HTML format", err=True) | ||
| sys.exit(1) | ||
|
|
||
| if output_format.lower() != "html" and open_browser: | ||
| click.echo("Warning: --open-browser only works with HTML format", err=True) | ||
|
|
||
| if output_format.lower() != "html" and all_projects: | ||
| click.echo("Error: --all-projects only works with HTML format", err=True) | ||
| sys.exit(1) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-HTML default run still uses all-projects HTML path
Right now the all-projects guard for non-HTML formats only checks the explicit --all-projects flag:
if output_format.lower() != "html" and all_projects:
...But later, when input_path is None, you implicitly set:
if input_path is None:
input_path = Path.home() / ".claude" / "projects"
all_projects = TrueThis means claude-code-log --format text (no input_path) will still go through process_projects_hierarchy(...) and generate HTML index files, ignoring the requested text format.
To align behavior with the intended restriction (“--all-projects only works with HTML format”), re-validate after you default all_projects:
@@
- # Handle default case - process all projects hierarchy if no input path and --all-projects flag
- if input_path is None:
- input_path = Path.home() / ".claude" / "projects"
- all_projects = True
+ # Handle default case - process all projects hierarchy if no input path
+ if input_path is None:
+ input_path = Path.home() / ".claude" / "projects"
+ all_projects = True
+
+ # After defaulting to all_projects, ensure non-HTML formats are rejected
+ if all_projects and output_format.lower() != "html":
+ click.echo(
+ "Error: --all-projects only works with HTML format", err=True
+ )
+ sys.exit(1)This preserves the early guard for explicit --all-projects and also covers the implicit default case.
Also applies to: 529-575
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
README.md (1)
142-142: Capitalize "Markdown" consistently as a proper noun.Per grammar conventions, "Markdown" should be capitalized when referring to the formatting language. Update lines 142 and 161:
-Convert transcripts to plain text or markdown format for documentation or terminal viewing: +Convert transcripts to plain text or Markdown format for documentation or terminal viewing:-- **markdown**: Same as text but with markdown heading hierarchy for better document integration +- **markdown**: Same as text but with Markdown heading hierarchy for better document integrationAlso applies to: 161-161
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
README.md(2 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md
[uncategorized] ~142-~142: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...t Convert transcripts to plain text or markdown format for documentation or terminal vi...
(MARKDOWN_NNP)
[uncategorized] ~161-~161: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...s - markdown: Same as text but with markdown heading hierarchy for better document i...
(MARKDOWN_NNP)
🔇 Additional comments (2)
README.md (2)
31-31: Past issue resolved: chat format now included in feature list.The updated line correctly lists all four output formats, addressing the previous review feedback. The phrasing is clear and consistent with the rest of the documentation.
175-175: File structure update is accurate.The addition of
claude_code_log/text_renderer.pycorrectly reflects the new text-rendering module introduced in this PR and is appropriately positioned in the file listing.
|
I like the idea! I was growing tired of all the special low-level checks needed in renderer.py, to verify what we were really dealing with. I thought this needed refactoring, with an intermediate layer that would be closer to the "final" structure, which would indeed also allow for "simpler" output than the HTML one... Looks like what you did goes in this direction. I'll review further but wanted to share some positive feedback anyway! |
|
Hey @golergka, really sorry about the slow response, this PR would be a great addition and the implementation looks good. That said, the project has had a fair bit of changes lately, adding support for more message types and some bits refactored, so when you get a bit of time please rebase (or redo) and make sure to cover these new output formats with integration tests (as there's a proper suite now with real test data). |
Implement multiple text-based output formats for Claude Code transcripts, providing alternatives to HTML for documentation and terminal viewing.
New Features:
Architecture:
CLI:
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests