XLCR

eXtensible Language Computation Runtime - a document conversion and splitting toolkit for the JVM.

XLCR converts between document formats (PDF, DOCX, XLSX, PPTX, HTML, ODS, and more) and splits documents into fragments (pages, sheets, slides, attachments). It ships as both a CLI tool and a library publishable to Maven Central.

Modules

Module	Artifact	Description
core	`xlcr-core`	Tika text extraction, document splitters (PDF/Excel/PowerPoint/Word/Email/Archive), XLSX-to-ODS conversion
core-aspose	`xlcr-core-aspose`	Aspose-powered conversions: PDF/DOCX/XLSX/PPTX/HTML with HIGH priority (commercial license required)
core-libreoffice	`xlcr-core-libreoffice`	LibreOffice-powered conversions: DOC/XLS/PPT/ODS to PDF as open-source fallback
xlcr	`xlcr`	Unified CLI with compile-time transform discovery, HTTP server, and automatic backend fallback (Scala 3 only)

Backend selection is automatic: Aspose (HIGH priority) > LibreOffice (DEFAULT) > Core. You can also select a backend explicitly with --backend aspose or --backend libreoffice.

Prerequisites

Java 17+ (tested with Java 17, 21, and 25)
Mill build tool (included via ./mill wrapper script)
LibreOffice (optional, for core-libreoffice backend)
Aspose license (optional, for core-aspose backend without watermarks)

Quick Start

Install from Source

git clone https://github.com/TJC-LP/xlcr.git
cd xlcr

# Build and install to ~/bin (no sudo)
make install-user

# Or install to /usr/local/bin (requires sudo)
make install

Use as a Library

// build.mill (Mill)
def mvnDeps = Seq(
  mvn"com.tjclp::xlcr-core:0.2.2",
  mvn"com.tjclp::xlcr-core-aspose:0.2.2"  // optional
)

// build.sbt (sbt)
libraryDependencies ++= Seq(
  "com.tjclp" %% "xlcr-core" % "0.2.2",
  "com.tjclp" %% "xlcr-core-aspose" % "0.2.2"  // optional
)

Published for Scala 3.8.2.

CLI Usage

Convert Documents

# Convert Word to PDF
xlcr convert -i document.docx -o output.pdf

# Convert with a specific backend
xlcr convert -i document.docx -o output.pdf --backend libreoffice

# Convert HTML to PowerPoint
xlcr convert -i presentation.html -o output.pptx

# Convert PDF to HTML (recommended for best editability)
xlcr convert -i document.pdf -o output.html

Split Documents

# Split PDF into individual pages
xlcr split -i document.pdf -d pages/

# Split Excel workbook into sheets
xlcr split -i workbook.xlsx -d sheets/

# Split PowerPoint into slides
xlcr split -i presentation.pptx -d slides/

# Extract email attachments
xlcr split -i message.eml -d attachments/

Other Commands

# Show document metadata
xlcr info -i document.pdf

# Show metadata/capabilities with runtime Aspose license checks (opt-in, slower)
xlcr info -i document.pdf --license-aware-capabilities

# List all supported conversions
xlcr --backend-info

# Version
xlcr --version

PowerPoint Workflows

# Strip template/branding for clean output
xlcr convert -i branded.pptx -o clean.html --strip-masters

# Two-stage PDF to PowerPoint (best editability, smallest files)
xlcr convert -i document.pdf -o intermediate.html
xlcr convert -i intermediate.html -o presentation.pptx

HTTP Server

XLCR provides a stateless REST API for document conversion, splitting, and metadata extraction.

Starting the Server

# Via installed CLI (after `make install` or `make install-user`)
xlcr server start --port 8080

# Via Mill (development)
./mill xlcr.run server start --port 8080

# Via Docker
docker compose up server

Server Options

Flag	Env Variable	Default	Description
`--host`	`XLCR_HOST`	`0.0.0.0`	Bind address
`--port`	`XLCR_PORT`	`8080`	Listen port
`--max-request-size`	`XLCR_MAX_REQUEST_SIZE`	`104857600`	Max body size (bytes)
`--lo-instances`	`XLCR_LO_INSTANCES`	`1`	Number of LibreOffice processes
`--lo-restart-after`	`XLCR_LO_RESTART_AFTER`	`200`	Restart LO after N conversions
`--lo-task-timeout`	`XLCR_LO_TASK_TIMEOUT`	`120000`	Conversion timeout (ms)
`--lo-queue-timeout`	`XLCR_LO_QUEUE_TIMEOUT`	`30000`	Queue wait timeout (ms)
`--license-aware-capabilities`	`XLCR_LICENSE_AWARE_CAPABILITIES`	`false`	Use runtime Aspose license checks for `/capabilities`, `/info`, and convert/split preflight checks

LibreOffice Process Pooling

For production deployments with heavy LibreOffice usage, run multiple instances for parallel conversions:

# 4 LibreOffice instances, restart each after 100 conversions
xlcr server start --lo-instances 4 --lo-restart-after 100

# Or via environment variables
XLCR_LO_INSTANCES=4 XLCR_LO_RESTART_AFTER=100 xlcr server start

# Enable runtime license-aware capability checks (opt-in)
xlcr server start --license-aware-capabilities

Each instance runs as a separate LibreOffice process on a dedicated port (starting from 2002). JODConverter handles round-robin task distribution and automatic process restarts. Budget ~200-300MB RAM per instance.

Endpoints

Method	Path	Description
`POST`	`/convert?to=<mime>`	Convert document to target format
`POST`	`/split`	Split document into fragments (ZIP output)
`POST`	`/info`	Get document metadata
`GET`	`/capabilities`	List all supported conversions
`GET`	`/health`	Health check (includes LibreOffice pool status)

Query Parameters

Parameter	Endpoints	Values	Description
`to`	`/convert`	MIME type or extension	Target format (required)
`detect`	`/convert`, `/split`, `/info`	`tika`	Force Tika content detection, ignore Content-Type header
`backend`	`/convert`, `/split`	`aspose`, `libreoffice`, `xlcr`	Use specific backend instead of auto-fallback
`check`	`/health`	`libreoffice`	Run on-demand LibreOffice runtime probe (starts LO lazily only for this probe)

Content-Type is optional on all endpoints. When missing, Tika automatically detects the format from content bytes. Use ?detect=tika to force Tika even when Content-Type is present.

Examples

# Convert DOCX to PDF
curl -X POST "http://localhost:8080/convert?to=pdf" \
  -H "Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document" \
  --data-binary @document.docx -o output.pdf

# Convert without Content-Type (Tika auto-detects)
curl -X POST "http://localhost:8080/convert?to=pdf" \
  --data-binary @document.docx -o output.pdf

# Force Tika detection (overrides Content-Type header)
curl -X POST "http://localhost:8080/convert?to=pdf&detect=tika" \
  --data-binary @document.docx -o output.pdf

# Use a specific backend
curl -X POST "http://localhost:8080/convert?to=pdf&backend=libreoffice" \
  --data-binary @document.docx -o output.pdf

# Split XLSX into sheets
curl -X POST "http://localhost:8080/split" \
  -H "Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" \
  --data-binary @workbook.xlsx -o sheets.zip

# Check server health and LibreOffice pool status
curl http://localhost:8080/health

# Probe LibreOffice runtime readiness explicitly
curl http://localhost:8080/health?check=libreoffice

# List capabilities
curl http://localhost:8080/capabilities

Development

Build Commands

./mill __.compile                    # Compile all modules
./mill __.test                       # Run all tests
./mill core.test                     # Run tests for a specific module
./mill __.checkFormat                # Check code formatting
./mill __.reformat                   # Fix formatting
./mill __.assembly                   # Build fat JARs

LibreOffice Setup

For the core-libreoffice module:

# macOS
brew install --cask libreoffice

# Ubuntu/Debian
sudo apt-get install libreoffice

# Custom path
export LIBREOFFICE_HOME=/path/to/libreoffice

Aspose License

For core-aspose tests without watermarks:

# Option 1: Copy license to resources
cp Aspose.Total.Java.lic core-aspose/resources/

# Option 2: Environment variable
export ASPOSE_TOTAL_LICENSE_B64=$(base64 < Aspose.Total.Java.lic)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Note: The Aspose module requires valid Aspose licenses for production use. Evaluation/trial licenses can be obtained from Aspose directly.

Name		Name	Last commit message	Last commit date
Latest commit History 408 Commits
.github		.github
core-aspose		core-aspose
core-libreoffice		core-libreoffice
core		core
docs		docs
scripts		scripts
xlcr		xlcr
.cursorrules		.cursorrules
.dockerignore		.dockerignore
.gitignore		.gitignore
.jvmopts		.jvmopts
.mill-jvm-opts		.mill-jvm-opts
.mill-version		.mill-version
.pre-commit-config.yaml		.pre-commit-config.yaml
.scalafix.conf		.scalafix.conf
.scalafmt.conf		.scalafmt.conf
.sdkmanrc		.sdkmanrc
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Dockerfile.runtime		Dockerfile.runtime
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SCALA.md		SCALA.md
build.mill		build.mill
docker-compose.yml		docker-compose.yml
mill		mill

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XLCR

Modules

Prerequisites

Quick Start

Install from Source

Use as a Library

CLI Usage

Convert Documents

Split Documents

Other Commands

PowerPoint Workflows

HTTP Server

Starting the Server

Server Options

LibreOffice Process Pooling

Endpoints

Query Parameters

Examples

Development

Build Commands

LibreOffice Setup

Aspose License

Contributing

License

About

Uh oh!

Releases 28

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

XLCR

Modules

Prerequisites

Quick Start

Install from Source

Use as a Library

CLI Usage

Convert Documents

Split Documents

Other Commands

PowerPoint Workflows

HTTP Server

Starting the Server

Server Options

LibreOffice Process Pooling

Endpoints

Query Parameters

Examples

Development

Build Commands

LibreOffice Setup

Aspose License

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 28

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages