Skip to content

feat: Tika CLI improvements — metadata-only, quiet output, zio-logging#69

Merged
arcaputo3 merged 7 commits intomainfrom
feat/tika-cli-improvements
Mar 4, 2026
Merged

feat: Tika CLI improvements — metadata-only, quiet output, zio-logging#69
arcaputo3 merged 7 commits intomainfrom
feat/tika-cli-improvements

Conversation

@arcaputo3
Copy link
Copy Markdown
Contributor

Summary

Production-shape the CLI for service hosting (Modal) with six improvements:

  • Metadata-only Tika extraction: xlcr info now uses DocumentInfo.extractMetadataOnly() with a zero-limit SAX handler instead of parsing the full document body. 208MB XLSX completes in <1s.
  • Decouple Aspose license from capability queries: BackendWiring.asposeCanConvert/canSplit now use static format lookups instead of triggering AsposeLicenseV2.isProductLicensed(). Eliminates eager license init on info, /capabilities (442+ calls), and /convert pre-flight checks.
  • Replace logback with zio-logging + SLF4J2 bridge: Unified logging — ZIO.log* and third-party SLF4J calls (Tika, POI, Aspose) all route through ZIO's console-err logger to stderr. XLCR_LOG_LEVEL env var controls root level (default: WARN).
  • Default-quiet CLI output: Success messages gated behind --verbose. By default, stdout is clean for piping — only structured data (JSON, XML, bytes).
  • Remove redundant shell backend-info: Wrapper script had its own stale license detection that disagreed with the Java CLI's runtime status.

Test plan

  • ./mill xlcr.compile — compiles
  • ./mill __.test — all tests pass
  • xlcr info -i large.xlsx --json — fast metadata, clean JSON on stdout
  • xlcr convert -i doc.xlsx -o out.pdf — zero stdout noise
  • xlcr convert -i doc.xlsx -o out.pdf -v — progress + success messages
  • XLCR_LOG_LEVEL=INFO xlcr info -i doc.pdf — library logs to stderr only
  • xlcr --backend-info — single clean block, license detected correctly
  • Native image rebuild + test (stretch — verify SLF4J2 bridge ServiceLoader)

🤖 Generated with Claude Code

arcaputo3 and others added 6 commits March 4, 2026 10:04
Switch extractMetadata() from BodyContentHandler(-1) (full body parse)
to DocumentInfo.extractMetadataOnly() which uses WriteOutContentHandler(0)
to bail on the first body character. Same metadata, much faster on large
or scanned documents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BackendWiring.asposeCanConvert/asposeCanSplit now call static format
lookups (AsposeTransforms.canConvert/canSplit) instead of the licensed
variants that trigger AsposeLicenseV2.isProductLicensed() side effects.

This eliminates eager Aspose license initialization on every `xlcr info`,
server GET /capabilities (442+ canConvert calls), POST /info, and
POST /convert pre-flight checks. The actual license gate still fires at
conversion time — unlicensed products produce ResourceError caught by
the fallback chain in UnifiedTransforms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Console appender now targets System.err instead of System.out, keeping
  stdout clean for structured CLI output (JSON, XML, converted bytes)
- XLCR_LOG_LEVEL env var controls root log level (default: INFO)
- XLCR_LOG_FILE env var controls log file path (default: logs/application.log,
  set to /dev/null to disable for ephemeral containers like Modal)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move "Successfully converted/split" messages behind --verbose flag.
By default the CLI now produces no stdout noise — only the converted
file or structured info output. Standard CLI convention: quiet by
default, -v opts into progress messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace logback-classic with zio-logging and zio-logging-slf4j2-bridge
for unified, ZIO-native logging:

- All ZIO.log* calls route through ZIO's console-err logger (stderr)
- Third-party SLF4J calls (Tika, POI, Aspose, JODConverter) are captured
  by the SLF4J2 bridge and routed through the same ZIO logger
- Log4j2 calls (Tika) still bridge via log4j-to-slf4j -> ZIO
- XLCR_LOG_LEVEL env var controls root level (default: WARN)
- stdout is now completely clean for structured output (JSON, XML, bytes)
- Remove logback.xml — no longer needed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The wrapper script had its own shell-based --backend-info handler that
duplicated (and disagreed with) the Java CLI's output. It couldn't
detect bundled Aspose licenses and reported "Evaluation mode" even when
the license was properly loaded at runtime.

Remove the shell check entirely — the Java CLI's --backend-info has
accurate runtime license detection and backend status.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@arcaputo3 arcaputo3 requested a review from cbelltjc March 4, 2026 18:42
Copy link
Copy Markdown

@cbelltjc cbelltjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

Thread licenseAwareCapabilities through BackendWiring, UnifiedTransforms,
all server routes, and CLI info/server commands. Default remains fast
static lookups (no Aspose license init). Opt-in via:

- CLI: `xlcr info -i doc.pdf --license-aware-capabilities`
- Server: `xlcr server start --license-aware-capabilities`
- Env: `XLCR_LICENSE_AWARE_CAPABILITIES=1`

Server capabilities cache stores both modes lazily. Tests added for
CLI flag parsing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@arcaputo3 arcaputo3 merged commit d943e24 into main Mar 4, 2026
17 checks passed
@arcaputo3 arcaputo3 deleted the feat/tika-cli-improvements branch March 4, 2026 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants