Roadmap

Things not yet implemented.

CLI

Most things in the cli have yet to be implemented.

Packaging
1. Downloadable/installable self-contained binary
  1. https://github.com/ClusterlessHQ/clusterless/releases
Improved S3CopyArc
1. Object renaming
2. Basic static partitioning
3. Include/exclude predicates
Custom Lambda based arc workloads
High frequency S3 listener boundary
1. For aggregating objects that arrive within a lot interval
2. https://docs.clusterless.io/reference/1.0-wip/components/aws-core-s3-put-listener-boundary.html
Native resources and workloads
1. AWS Glue database and catalog updates
2. AWS Athena CTAS/INSERT INTO queries (for chaining SQL)
3. AWS Sagemaker training/validation
Common data processing workloads
1. Data reformatting (from text/json to binary/parquet) 1https://github.com/ClusterlessHQ/tessellate
2. Dynamic data repartitioning (partitions based on data like timestamps)
  1. https://github.com/ClusterlessHQ/tessellate
3. Predicate/duplicate index creation and data filtering
Join Barrier implementations
Scheduled arc executions
1. Some arcs may need to run periodically
Parallelized workloads
1. Workloads can be parallelized on source partitions
Pluggable modules for providing third-party services
Localstack support for faster testing AWS scenarios
Alternate substrates/providers
Azure
GCP
Digital Ocean
OCI