feat: Tika CLI improvements — metadata-only, quiet output, zio-logging#69
Merged
feat: Tika CLI improvements — metadata-only, quiet output, zio-logging#69
Conversation
Switch extractMetadata() from BodyContentHandler(-1) (full body parse) to DocumentInfo.extractMetadataOnly() which uses WriteOutContentHandler(0) to bail on the first body character. Same metadata, much faster on large or scanned documents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BackendWiring.asposeCanConvert/asposeCanSplit now call static format lookups (AsposeTransforms.canConvert/canSplit) instead of the licensed variants that trigger AsposeLicenseV2.isProductLicensed() side effects. This eliminates eager Aspose license initialization on every `xlcr info`, server GET /capabilities (442+ canConvert calls), POST /info, and POST /convert pre-flight checks. The actual license gate still fires at conversion time — unlicensed products produce ResourceError caught by the fallback chain in UnifiedTransforms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Console appender now targets System.err instead of System.out, keeping stdout clean for structured CLI output (JSON, XML, converted bytes) - XLCR_LOG_LEVEL env var controls root log level (default: INFO) - XLCR_LOG_FILE env var controls log file path (default: logs/application.log, set to /dev/null to disable for ephemeral containers like Modal) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move "Successfully converted/split" messages behind --verbose flag. By default the CLI now produces no stdout noise — only the converted file or structured info output. Standard CLI convention: quiet by default, -v opts into progress messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace logback-classic with zio-logging and zio-logging-slf4j2-bridge for unified, ZIO-native logging: - All ZIO.log* calls route through ZIO's console-err logger (stderr) - Third-party SLF4J calls (Tika, POI, Aspose, JODConverter) are captured by the SLF4J2 bridge and routed through the same ZIO logger - Log4j2 calls (Tika) still bridge via log4j-to-slf4j -> ZIO - XLCR_LOG_LEVEL env var controls root level (default: WARN) - stdout is now completely clean for structured output (JSON, XML, bytes) - Remove logback.xml — no longer needed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The wrapper script had its own shell-based --backend-info handler that duplicated (and disagreed with) the Java CLI's output. It couldn't detect bundled Aspose licenses and reported "Evaluation mode" even when the license was properly loaded at runtime. Remove the shell check entirely — the Java CLI's --backend-info has accurate runtime license detection and backend status. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Thread licenseAwareCapabilities through BackendWiring, UnifiedTransforms, all server routes, and CLI info/server commands. Default remains fast static lookups (no Aspose license init). Opt-in via: - CLI: `xlcr info -i doc.pdf --license-aware-capabilities` - Server: `xlcr server start --license-aware-capabilities` - Env: `XLCR_LICENSE_AWARE_CAPABILITIES=1` Server capabilities cache stores both modes lazily. Tests added for CLI flag parsing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Production-shape the CLI for service hosting (Modal) with six improvements:
xlcr infonow usesDocumentInfo.extractMetadataOnly()with a zero-limit SAX handler instead of parsing the full document body. 208MB XLSX completes in <1s.BackendWiring.asposeCanConvert/canSplitnow use static format lookups instead of triggeringAsposeLicenseV2.isProductLicensed(). Eliminates eager license init oninfo,/capabilities(442+ calls), and/convertpre-flight checks.ZIO.log*and third-party SLF4J calls (Tika, POI, Aspose) all route through ZIO's console-err logger to stderr.XLCR_LOG_LEVELenv var controls root level (default: WARN).--verbose. By default, stdout is clean for piping — only structured data (JSON, XML, bytes).Test plan
./mill xlcr.compile— compiles./mill __.test— all tests passxlcr info -i large.xlsx --json— fast metadata, clean JSON on stdoutxlcr convert -i doc.xlsx -o out.pdf— zero stdout noisexlcr convert -i doc.xlsx -o out.pdf -v— progress + success messagesXLCR_LOG_LEVEL=INFO xlcr info -i doc.pdf— library logs to stderr onlyxlcr --backend-info— single clean block, license detected correctly🤖 Generated with Claude Code