XLake

A GStreamer-like Real-Time Workflow Framework, supporting NVIDIA Omniverse, Python and Web UI, powered by K8S & Rust.

It is focused on the real-time data streaming, NOT the batch streaming like Data Lake. However, you can utilize the batch streaming as a format & store component in your real-time pipeline.

It is under a heavy construction. Unfinished features may have significant changes in composition and usage. Please read the feature support below carefully.

Feature support

Components

Type	How to read
Feature Kind	e.g. model
Feature Group	e.g. builtin/
Feature Name	e.g. doc
Feature's Usage	e.g. model/builtin/doc -> docmodel ♜ Ignore the group names and swap the term 🏰
Model Function Name	e.g. :split
Model Function's Usage	e.g. model/builtin/doc:split -> doc:split
Status	✅ Yes 🚧 WIP 🔎 TBA 🔲 TBD

🚧 batch (Data File Format)
- 🚧 datafusion (Apache DataFusion; by default)
🔎 cluster (Parallel Computing on HPC)
- 🔎 local (Current Process; by default)
- 🔲 ray (Ray Cluster; Python-only)
🔎 engine (Scalable Cluster Management & Job Scheduling System)
- 🔲 k8s (Kubernetes for Containerized Applications, HPC-Ready with OpenARK)
- 🔎 local (Host Machine; by default)
- 🔲 slurm (Slurm Workload Manager for HPC)
- 🔲 terraform (Terraform by HashiCorp for Cloud Providers)
🚧 model (Data Schema & Metadata)
- 🚧 builtins/ (Primitives)
  - 🔎 batch (Auto-derived by the batch)
    - 🔲 :group
    - 🔲 :filter
    - 🔲 :kmeans
    - 🔎 :python
  - ✅ binary
  - 🔎 content
    - 🔎 :prompt (LLM Prompt)
  - 🚧 doc
    - 🔲 :split
  - 🔲 embed
    - 🔲 :vector_search
  - ✅ file
  - ✅ hash (Hashable -> Storable)
  - 🔲 metadata (Nested, Unsafe, for additional description)
  - 🔎 stream (Auto-derived by the stream)
    - 🔎 :python
- 🔲 document/ (LibreOffice, etc.)
  - 🔲 email
  - 🔲 markdown
  - 🔲 pdf
  - 🔲 tex
- 🔲 media/ (GStreamer)
  - 🔲 audio
  - 🔲 image
  - 🔲 video
- 🔲 ml/ (Machine Learning, not Artificial Intelligence)
  - 🔲 torch (PyTorch)
    - 🔲 eval
    - 🔲 train
- 🔲 twin/ (Digital Twin)
  - 🔲 loc (Location)
  - 🔲 rot (Rotation)
  - 🔲 usd (OpenUSD)
🚧 sink (Data Visualization & Workload Automation)
- 🚧 local/
  - 🔲 file
  - 🔲 media (GStreamer)
  - ✅ stdout
- 🔲 twin/ (Digital Twin & Robotics)
  - 🔲 omni (NVIDIA Omniverse)
🚧 src (Data Source)
- 🔲 cloud/
  - 🔲 gmail (Google Gmail)
- 🔲 desktop/
  - 🔲 screen (Screen Capture & Recording)
- 🚧 local/
  - 🚧 file
    - ✅ Content-based Hash
    - ✅ Lazy Evaluation
    - 🔲 Metadata-based Hash
  - ✅ stdin
- 🔲 ml/ (Machine Learning Models & Datasets)
  - 🔲 huggingface (Hugging Face Models & Datasets)
  - 🔲 kaggle (Kaggle Datasets)
- 🔲 monitoring/ (Time series database, etc.)
  - 🔲 prometheus (CNCF-graduated TSDB)
- 🔲 rtls/ (Real-Time Location System)
  - 🔲 sewio (Sewio UWB)
- 🔲 twin/ (Digital Twin)
  - 🔲 omni (NVIDIA Omniverse)
🚧 store (Object Store, Cacheable)
- 🔲 batch/
  - 🔲 delta (Delta Lake)
  - 🔲 lance (100x faster random access than Parquet)
- 🔲 cdl (Connected Data Lake)
- 🔲 cloud/
  - 🔲 gdrive (Google Drive)
  - 🔲 s3 (Amazon S3)
    - 🔲 Multipart upload API
- ✅ local (FileSystem)
🚧 stream (Data Streaming & Messaging System)
- 🔲 kafka (Apache Kafka)
- ✅ memory (In-Memory; by default)
  - ✅ Dynamic type casting
  - ✅ Lazy Evaluation
- 🔲 nats (An Edge & Cloud Native Messaging System)

User Interfaces

Type	How to read
Status	✅ Yes 🚧 WIP 🔎 TBA 🔲 TBD

🔎 API
- 🔲 Python
- 🔎 Rust
🚧 CLI
- ✅ Command-line arguments (GStreamer-like Inline Pipeline)
- 🔎 Container images
- 🔲 YAML templates
🔎 Web UI
- 🔎 Backend
- 🔲 Frontend
  - 🔲 Cluster Management
  - 🔲 Dashboard
  - 🔲 Graph-based Pipeline Visualization
    - 🔲 Interactive Pipeline Composition
    - 🔲 Run & Stop
    - 🔲 Save as YAML templates
  - 🔲 Interactive Office
    - 🔲 Graphviz to Pipeline
    - 🔲 Sketch to Graphviz
    - 🔲 Voice to Graphviz
  - 🔲 Job Scheduling
  - 🔲 Storage Management
- 🔲 Helm Chart

Requirements

Ubuntu 24.04 or Above

# Install essentials packages
sudo apt-get update && sudo apt-get install \
  default-jre \
  libreoffice-java-common \
  rustup

# Install the latest rustc
rustup default stable
rustup update

Usage

Save a File into the Storage

Change the file path and the store type into your preferred ones.

cargo run --release -- xlake "filesrc path='my_file.pdf'
  ! localstore path='my_cache_dir'
  ! stdoutsink"

LLM Search on my Gmail

cargo run --release -- xlake "gmailsrc k=10
  ! localstore
  ! doc:split to=paragraph
  ! doc:embed embeddings=openai
  ! localstore
  ! embed:vector_search query='my query' k=5
  ! content:prompt prompt="Summarize the email contents in bullets"
  ! stdoutsink"

Simple LLM Call

cargo run --release -- xlake "emptysrc
  ! content:prompt prompt='Which is better: coke zero vs normal coke'
  ! stdoutsink"

Usage in Container Runtime (Docker, ...)

docker run --rm quay.io/ulagbulag/xlake:latest "emptysrc
  ! content:prompt prompt='Which is better: coke zero vs normal coke'
  ! stdoutsink"

License

^{Licensed under either of Apache License, Version
2.0 or MIT license at your option.}
_{Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in XLake by you, as defined in the Apache-2.0 license, shall be
dual licensed as above, without any additional terms or conditions.}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
crates		crates
.dockerignore		.dockerignore
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

XLake

Feature support

Components

User Interfaces

Requirements

Ubuntu 24.04 or Above

Usage

Save a File into the Storage

LLM Search on my Gmail

Simple LLM Call

Usage in Container Runtime (Docker, ...)

License

About

Licenses found

Releases

Languages

License

Licenses found

SmartX-Team/XLake

Folders and files

Latest commit

History

Repository files navigation

XLake

Feature support

Components

User Interfaces

Requirements

Ubuntu 24.04 or Above

Usage

Save a File into the Storage

LLM Search on my Gmail

Simple LLM Call

Usage in Container Runtime (Docker, ...)

License

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Languages