A GStreamer-like Real-Time Workflow Framework, supporting NVIDIA Omniverse, Python and Web UI, powered by K8S & Rust.
It is focused on the real-time data streaming, NOT the batch streaming like Data Lake. However, you can utilize the batch streaming as a format & store component in your real-time pipeline.
It is under a heavy construction. Unfinished features may have significant changes in composition and usage. Please read the feature support below carefully.
Type | How to read |
---|---|
Feature Kind | e.g. model |
Feature Group | e.g. builtin/ |
Feature Name | e.g. doc |
Feature's Usage | e.g. model/builtin/doc -> docmodel ♜ Ignore the group names and swap the term 🏰 |
Model Function Name | e.g. :split |
Model Function's Usage | e.g. model/builtin/doc:split -> doc:split |
Status | ✅ Yes 🚧 WIP 🔎 TBA 🔲 TBD |
- 🚧 batch (Data File Format)
- 🚧 datafusion (Apache DataFusion; by default)
- 🔎 cluster (Parallel Computing on HPC)
- 🔎 local (Current Process; by default)
- 🔲 ray (Ray Cluster; Python-only)
- 🔎 engine (Scalable Cluster Management & Job Scheduling System)
- 🔲 k8s (Kubernetes for Containerized Applications, HPC-Ready with OpenARK)
- 🔎 local (Host Machine; by default)
- 🔲 slurm (Slurm Workload Manager for HPC)
- 🔲 terraform (Terraform by HashiCorp for Cloud Providers)
- 🚧 model (Data Schema & Metadata)
- 🚧 builtins/ (Primitives)
- 🔎 batch (Auto-derived by the batch)
- 🔲 :group
- 🔲 :filter
- 🔲 :kmeans
- 🔎 :python
- ✅ binary
- 🔎 content
- 🔎 :prompt (LLM Prompt)
- 🚧 doc
- 🔲 :split
- 🔲 embed
- 🔲 :vector_search
- ✅ file
- ✅ hash (Hashable -> Storable)
- 🔲 metadata (Nested, Unsafe, for additional description)
- 🔎 stream (Auto-derived by the stream)
- 🔎 :python
- 🔎 batch (Auto-derived by the batch)
- 🔲 document/ (LibreOffice, etc.)
- 🔲 markdown
- 🔲 tex
- 🔲 media/ (GStreamer)
- 🔲 audio
- 🔲 image
- 🔲 video
- 🔲 ml/ (Machine Learning, not Artificial Intelligence)
- 🔲 torch (PyTorch)
- 🔲 eval
- 🔲 train
- 🔲 torch (PyTorch)
- 🔲 twin/ (Digital Twin)
- 🔲 loc (Location)
- 🔲 rot (Rotation)
- 🔲 usd (OpenUSD)
- 🚧 builtins/ (Primitives)
- 🚧 sink (Data Visualization & Workload Automation)
- 🚧 local/
- 🔲 file
- 🔲 media (GStreamer)
- ✅ stdout
- 🔲 twin/ (Digital Twin & Robotics)
- 🔲 omni (NVIDIA Omniverse)
- 🚧 local/
- 🚧 src (Data Source)
- 🔲 cloud/
- 🔲 gmail (Google Gmail)
- 🔲 desktop/
- 🔲 screen (Screen Capture & Recording)
- 🚧 local/
- 🚧 file
- ✅ Content-based Hash
- ✅ Lazy Evaluation
- 🔲 Metadata-based Hash
- ✅ stdin
- 🚧 file
- 🔲 ml/ (Machine Learning Models & Datasets)
- 🔲 huggingface (Hugging Face Models & Datasets)
- 🔲 kaggle (Kaggle Datasets)
- 🔲 monitoring/ (Time series database, etc.)
- 🔲 rtls/ (Real-Time Location System)
- 🔲 sewio (Sewio UWB)
- 🔲 twin/ (Digital Twin)
- 🔲 omni (NVIDIA Omniverse)
- 🔲 cloud/
- 🚧 store (Object Store, Cacheable)
- 🔲 batch/
- 🔲 delta (Delta Lake)
- 🔲 lance (100x faster random access than Parquet)
- 🔲 cdl (Connected Data Lake)
- 🔲 cloud/
- 🔲 gdrive (Google Drive)
- 🔲 s3 (Amazon S3)
- ✅ local (FileSystem)
- 🔲 batch/
- 🚧 stream (Data Streaming & Messaging System)
- 🔲 kafka (Apache Kafka)
- ✅ memory (In-Memory; by default)
- ✅ Dynamic type casting
- ✅ Lazy Evaluation
- 🔲 nats (An Edge & Cloud Native Messaging System)
Type | How to read |
---|---|
Status | ✅ Yes 🚧 WIP 🔎 TBA 🔲 TBD |
- 🔎 API
- 🔲 Python
- 🔎 Rust
- 🚧 CLI
- ✅ Command-line arguments (GStreamer-like Inline Pipeline)
- 🔎 Container images
- 🔲 YAML templates
- 🔎 Web UI
- 🔎 Backend
- 🔲 Frontend
- 🔲 Cluster Management
- 🔲 Dashboard
- 🔲 Graph-based Pipeline Visualization
- 🔲 Interactive Pipeline Composition
- 🔲 Run & Stop
- 🔲 Save as YAML templates
- 🔲 Interactive Office
- 🔲 Graphviz to Pipeline
- 🔲 Sketch to Graphviz
- 🔲 Voice to Graphviz
- 🔲 Job Scheduling
- 🔲 Storage Management
- 🔲 Helm Chart
# Install essentials packages
sudo apt-get update && sudo apt-get install \
default-jre \
libreoffice-java-common \
rustup
# Install the latest rustc
rustup default stable
rustup update
Change the file path and the store type into your preferred ones.
cargo run --release -- xlake "filesrc path='my_file.pdf'
! localstore path='my_cache_dir'
! stdoutsink"
cargo run --release -- xlake "gmailsrc k=10
! localstore
! doc:split to=paragraph
! doc:embed embeddings=openai
! localstore
! embed:vector_search query='my query' k=5
! content:prompt prompt="Summarize the email contents in bullets"
! stdoutsink"
cargo run --release -- xlake "emptysrc
! content:prompt prompt='Which is better: coke zero vs normal coke'
! stdoutsink"
docker run --rm quay.io/ulagbulag/xlake:latest "emptysrc
! content:prompt prompt='Which is better: coke zero vs normal coke'
! stdoutsink"
Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in XLake by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.