Releases: dstackai/dstack
0.18.16
New versioning policy
Starting with this release, dstack
adopts a new versioning policy to provide better server and client backward compatibility and improve the upgrading experience. dstack
continues to follow semver versioning scheme ({major}.{minor}.{patch}
) with the following principles:
- The server backward compatibility is maintained across all minor and patch releases. The specific features can be removed but the removal is preceded with deprecation warnings for several minor releases. This means you can use older client versions with newer server versions.
- The client backward compatibility is maintained across patch releases. A new minor release indicates that the release breaks client backward compatibility. This means you don't need to update the server when you update the client to a new patch release. Still, upgrading a client to a new minor version requires upgrading the server too.
Perviously, dstack
never guaranteed client backward compatibility, so you had to always update the server when updating the client. The new versioning policy makes the client and server upgrading more flexible.
Note: The new policy only takes affect after both the clients and the server are upgraded to 0.18.16
. The 0.18.15
server still won't work with newer clients.
dstack attach
The CLI gets a new dstack attach
command that allows attaching to a run. It establishes the SSH tunnel, forwards ports, and streams run logs in real time:
✗ dstack attach silent-panther-1
Attached to run silent-panther-1 (replica=0 job=0)
Forwarded ports (local -> remote):
- localhost:7860 -> 7860
To connect to the run via SSH, use `ssh silent-panther-1`.
Press Ctrl+C to detach...
This command is a replacement for dstack logs --attach
with major improvements and bugfixes.
CloudWatch-related bugfixes
The releases includes several important bugfixes for CloudWatchLogStorage
. We strongly recommend upgrading the dstack
server if it's configured to store logs in CloudWatch.
Deprecations
dstack logs --attach
is deprecated in favor ofdstack attach
and may be removed in the following minor releases.
What's Changed
- Check client-server compatibility according to new versioning policy by @r4victor in #1730
- [runner] fix MonotonicTimestamp by @un-def in #1728
- Gateway-in-server early prototype by @jvstme in #1718
- Implement dstack attach command by @r4victor in #1733
- Respect CloudWatch timestamp constraints by @un-def in #1732
- Add AMD examples with vLLM, Axolotl and Trl by @Bihan in #1693
- dstack-proxy naming tweaks by @jvstme in #1734
- Fix Failed to attach via Python API by @r4victor in #1739
- Support calling RunCollection.get_plan() without repo by @r4victor in #1741
Full Changelog: 0.18.15...0.18.16
0.18.15
Cluster placement groups
Instances of AWS cluster fleets are now provisioned into cluster placement groups for better connectivity. For example, when you create this fleet:
type: fleet
name: my-cluster-fleet
nodes: 4
placement: cluster
backends: [aws]
dstack
will automatically create a cluster placement group and use it to provision the instances.
On-prem and VM-based fleets improvements
- All available Nvidia driver capabilities are now requested by default, which makes it possible to run GPU workloads requiring OpenGL/Vulkan/RT/Video Codec SDK libraries. (#1714)
- Automatic container cleanup. Previously, when the run completed, either successfully or due to an error, its container was not deleted, which led to ever-increasing storage consumption. Now, only the last stopped container is preserved and is available until the next run is completed. (#1706)
Major bug fixes
- Fixed a bug where under some conditions logs wouldn't be uploaded to CloudWatch Logs due to size limits. (#1712)
- Fixed a bug that prevented running services on on-prem instances. (#1716)
Changelog
- Fix cli connection issue with TPU by @Bihan in #1705
- Rename
--default
to--yes
andno-default
to--no
indstack config
anddstack server
by @peterschmidt85 in #1709 - [CI] Fix shim/runner release versions by @un-def in #1704
- Document run diagnostic logs by @r4victor in #1710
- [shim] Add old container cleanup routine by @un-def in #1706
- Write events to CloudWatch in batches by @un-def in #1712
- [shim] Request all Nvidia driver capabilities by @un-def in #1714
- Added showing dstack version on the UI by @olgenn in #1717
- Add missing project SSH key to on-prem instances by @un-def in #1716
- Simplify handling missing
GatewayConfiguration
by @jvstme in #1724 - [shim] Fix container logs processing by @un-def in #1721
- Support AWS placement groups for cluster fleets by @r4victor in #1725
Full Changelog: 0.18.14...0.18.15
0.18.15rc1
On-prem and VM-based fleets improvements
- All available Nvidia driver capabilities are now requested by default, which makes it possible to run GPU workloads requiring OpenGL/Vulkan/RT/Video Codec SDK libraries.
- Automatic container cleanup. Previously, when the run completed, either successfully or due to an error, its container was not deleted, which led to ever-increasing storage consumption. Now, only the last stopped container is preserved and is available until the next run is completed.
Major bug fixes
- Fixed a bug where under some conditions logs wouldn't be uploaded to CloudWatch Logs due to size limits.
Changelog
- [UX] Rename
--default
to--yes
and--no-default
to--no
indstack config
anddstack server
by @peterschmidt85 in #1709 - Fix cli connection issue with TPU by @Bihan in #1705
- Fix
dstack-shim
anddstack-runner
release versions by @un-def in #1704 - Request all Nvidia driver capabilities by @un-def in #1714
- Add old container cleanup routine by @un-def in #1706
- Write events to CloudWatch in batches by @un-def in #1712
- [Docs] Document run diagnostic logs by @r4victor in #1710
- [Docs] Added the
server deployment
guide, updated theREADME.md
for the Docker Hub, fixed the scrolling issue by @peterschmidt85
Full changelog: 0.18.14...0.18.15rc1
0.18.14
Multi-replica server deployment
Previously, the dstack
server only supported deploying a single instance (replica). However, with 0.18.14, you can now deploy multiple replicas, enabling high availability and zero-downtime updates
Note
Multi-replica server deployment requires using Postgres instead of the default SQLite. To configure Postgres, set the DSTACK_DATABASE_URL
environment variable.
Make sure to update to version 0.18.14 before configuring multiple replicas.
Major bug-fixes
- [Bugfix]
dstack init --git-identity
doesn't accept backslashes in path on Windows by @un-def in #1686 - [Bugfix] Use
-tmpfs /dev/shm:rw,nosuid,nodev,exec,size=X
instead of--shm-size=X
@un-def in #1690 - [Bugfix]
dstack-shim
is not updated when fleet is recreated by @un-def in #1698
Other
- [Bugfix] Fix
SSHAttach.reuse_ports_lock()
when no grep matches by @un-def in #1700 - [Bugfix] Fix logger exception on instance provisioning timeout by @un-def in #1697
- [Internal] Add
JobProvisioningData.base_backend
by @r4victor in #1682 - [Internal] Add
Run.error
by @r4victor in #1684 - [Internal] Return server_version in
/api/server/get_info
by @r4victor in #1685 - [Internal] Allow gateway to connect to replicated server by @jvstme in #1688
- [Internal] Adjust gateway management for multiple server replicas by @r4victor in #1691
- [Internal] Skip gateway update if gateway was updated recently by @r4victor in #1695
- [Internal] Remove redundant
logger.error
by @r4victor in #1702
Full changelog: 0.18.13...0.18.14
0.18.13
Windows
You can now use the CLI on Windows (WSL 2 is not required).
Ensure that Git and OpenSSH are installed via Git for Windows.
During installation, select Git from the command line and also from 3-rd party software
(or Use Git and optional Unix tools from the Command Prompt
), and Use bundled OpenSSH
checkboxes.
Spot policy
Previously, dev environments used the on-demand
spot policy, while tasks and services used auto
. With this update, we've changed the default spot policy to always be on-demand
for all configurations. Users will now need to explicitly specify the spot policy if they want to use spot instances.
Troubleshooting
The documentation now includes a Troubleshooting guide with instructions on how to report issues.
Changelog
- [UX] Add Windows support by @un-def in #1675
- [UX] Changed the default
spot_policy
toon-demand
by @r4victor in #1657 and #1660 - [UI] Minor UI improvements by @olgenn in #1658
- [UX] Check SSH keys when SSH fleet creation before submission by @r4victor in #1661
- [Docs] Add TPU examples with Optimum TPU and vLLM by @Bihan in #1663
- [Troubleshooting] Do not auto-delete failed instances by @r4victor in #1665
- [Docs] Document SQLite to Postgres migration by @r4victor in #1678
- [Internal] Implement Postgres locking by @r4victor in #1651
- [Internal] Refactor
SSHTunnel
by @jvstme in #1669 - [Internal] Replace
String
withText
for long database columns by @r4victor in #1677 - [Internal] Take advisory lock on server init by @r4victor in #1674
All commits: 0.18.12...0.18.13
0.18.12
Features
Major bugfixes
- Fixed the order of CloudWatch log events in the web interface by @un-def in #1613
- Fixed a bug where CloudWatch log events might not be displayed in the web inferface for old runs by @un-def in #1652
- Prevent possible server freeze on SSH connections by @jvstme in #1627
Other changes
- [CLI] Show run name before detaching by @jvstme in #1607
- Increase time waiting for OCI Bare Metal instances by @jvstme in #1630
- Update lambda regions by @r4victor in #1634
- Change CloudWatch group check method by @un-def in #1615
- Add Postgres tests by @r4victor in #1628
- Fix lambda tests by @r4victor in #1635
- [Docs] Fixed a bug where search included non-existing pages that land to 404 by @peterschmidt85 in #1646
- [Docs] Introduce the Providers page by @peterschmidt85 in #1653
- [Docs] Update RunPod & DataCrunch setup guides by @jvstme in #1608
- [Docs] Add information about run log storage by @un-def in #1621
- [Internal] Update packer templates docs by @jvstme in #1619
Full changelog: 0.18.11...0.18.12
0.18.12rc1
Features
Major bugfixes
- Fixed the order of CloudWatch log events in the web interface by @un-def in #1613
- Fixed a bug where CloudWatch log events might not be displayed in the web inferface for old runs by @un-def in #1652
- Prevent possible server freeze on SSH connections by @jvstme in #1627
Other changes
- [CLI] Show run name before detaching by @jvstme in #1607
- Increase time waiting for OCI Bare Metal instances by @jvstme in #1630
- Update lambda regions by @r4victor in #1634
- Change CloudWatch group check method by @un-def in #1615
- Add Postgres tests by @r4victor in #1628
- Fix lambda tests by @r4victor in #1635
- [Docs] Fixed a bug where search included non-existing pages that land to 404 by @peterschmidt85 in #1646
- [Docs] Introduce the Providers page by @peterschmidt85 in #1653
- [Docs] Update RunPod & DataCrunch setup guides by @jvstme in #1608
- [Docs] Add information about run log storage by @un-def in #1621
- [Internal] Update packer templates docs by @jvstme in #1619
Full changelog: 0.18.11...0.18.12rc1
0.18.11
AMD
With the latest update, you can now specify an AMD GPU under resources
. Below is an example.
type: service
name: amd-service-tgi
image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm
env:
- HUGGING_FACE_HUB_TOKEN
- MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct
- TRUST_REMOTE_CODE=true
- ROCM_USE_FLASH_ATTN_V2_TRITON=true
commands:
- text-generation-launcher --port 8000
port: 8000
resources:
gpu: MI300X
disk: 150GB
spot_policy: auto
model:
type: chat
name: meta-llama/Meta-Llama-3.1-70B-Instruct
format: openai
Note
AMD accelerators are currently supported only with the runpod
backend. Support for on-prem fleets and more backends
is coming soon.
GPU vendors
The gpu
property now accepts the vendor
attribute, with supported values: nvidia
, tpu
, and amd
.
Alternatively, you can also prefix the GPU name with the vendor name followed by a colon, for example: tpu:v2-8
or amd:192GB
, etc. This change ensures consistency in GPU requirements configuration across vendors.
Encryption
dstack
now supports encryption of sensitive data, such as backend credentials, user tokens, etc. Learn more on the reference page.
Storing logs in AWS CloudWatch
By default, the dstack
server stores run logs in ~/.dstack/server/projects/<project name>/logs
. To store logs in AWS CloudWatch, set the DSTACK_SERVER_CLOUDWATCH_LOG_GROUP environment variable.
Project manager role
With this update, it's now possible to assign any user as a project manager. This role grants permission to manage project users but does not allow management of backends or resources.
Default permissions
By default, all users can create and manage their own projects. If you want only global admins to create projects, add the following to ~/.dstack/server/config.yml
:
default_permissions:
allow_non_admins_create_projects: false
Other
- [Feature] Allow to store logs in AWS CloudWatch by @un-def in #1597 and #1597
- [Feature] Introduce default permissions #1559 by @olgenn in #1567
- [Feature] Support the
vendor
property underresources.gpu
@un-def in #1558 - [Feature] Implement configurable default permissions by @r4victor in #1591
- [Bugfix] Provision AWS instances in all eligible availability zones by @r4victor in #1585
- [Bugfix] Support users without projects @olgenn in #1578
- [UI] Support
manager
project role @olgenn in #1566 - [Docs] Mention AMD GPUs, describe
gpu.vendor
property by @un-def in #1570 - [Bugfix] Fix global admin restricted by manager role by @r4victor in #1592
- [Bugfix] Fixed defect with incorrect setting project role in the UI by @olgenn in #1593
- [Bugfix] Abort provisioning fleet when parsing ssh key fails(#1442) by @swsvc in #1589
- [UI] Ensure users can create projects #191 by @olgenn in #1554
- [UI] Use a toggle button switching themes #190 by @olgenn in #1556
- [UI] Fix the Logs component appearance for the dark theme by @olgenn in #1579
- [UI] Minor restyle of the side navigation by @olgenn in #1580
- [Bugfix] Avoid TGI error
logit_bias: invalid type
by @jvstme in #1557 - [Docs] Document projects #1547 by @peterschmidt85 in #1548
- [Docs] Document AMD support on RunPod by @peterschmidt85 in #1598
- [Internal] Approximate on-prem GPU memory size by @jvstme in #1588
- [Docs] Fix some of the broken links by @jvstme in #1602
- [Docs] Fix broken links in README.md by @jvstme in #1604
- [Docs] Document configuring logs storage in AWS CloudWatch @un-def in #1606
- [Docs] Publish the blog post and examples about AMD on RunPod by @peterschmidt85 in #1598
- [Internal] Force
root
in Kubernetes runs by @jvstme in #1555 - [Internal] Improve gateway auth issues troubleshooting by @jvstme in #1569
- [Feature] Implement "encryption at rest" by @r4victor in #1561
- [Feature] Implement project
manager
role by @r4victor in #1572 - [Feature] Implement user activation/deactivation by @r4victor in #1575
- [Internal] Reintroduce
tpu-
prefix; addtpu
vendor alias by @un-def in #1587
New contributors
Full changelog: 0.18.10...0.18.11
0.18.11rc1
AMD
With the latest update, you can now specify an AMD GPU under resources
. Below is an example.
type: service
name: amd-service-tgi
image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm
env:
- HUGGING_FACE_HUB_TOKEN
- MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct
- TRUST_REMOTE_CODE=true
- ROCM_USE_FLASH_ATTN_V2_TRITON=true
commands:
- text-generation-launcher --port 8000
port: 8000
resources:
gpu: MI300X
disk: 150GB
spot_policy: auto
model:
type: chat
name: meta-llama/Meta-Llama-3.1-70B-Instruct
format: openai
Note
AMD accelerators are currently supported only with the runpod
backend. Support for on-prem fleets and more backends
is coming soon.
Other
- [Docs] Document projects #1547 by @peterschmidt85 in #1548
- [UI] Ensure users can create projects #191 by @olgenn in #1554
- [UI] Use a toggle button switching themes #190 by @olgenn in #1556
- [Bugfix] Force
root
in Kubernetes runs by @jvstme in #1555 - [Bugfix] Avoid TGI error
logit_bias: invalid type
by @jvstme in #1557 - Support the
vendor
property undergpu
@un-def in #1558 - [Internal] Improve gateway auth issues troubleshooting by @jvstme in #1569
- [Feature] Implement "encryption at rest" by @r4victor in #1561
- [Feature] Implement project
manager
role by @r4victor in #1572 - [Feature] Implement user activation/deactivation by @r4victor in #1575
- [Bugfix] Support users without projects @olgenn in #1578
- [UI] Fix the Logs component appearance for the dark theme by @olgenn in #1579
- [UI] Minor restyle of the side navigation by @olgenn in #1580
- [Internal] Replace
pkg_resources
withimportlib.resources
by @r4victor in #1582 - [UI] Support
manager
project role @olgenn in #1566 - [Bugfix] Provision AWS instances in all eligible availability zones by @r4victor in #1585
- [Feature] Implement configurable default permissions by @r4victor in #1591
- [Internal] Reintroduce
tpu-
prefix; addtpu
vendor alias by @un-def in #1587 - [Docs] Mention AMD GPUs, describe
gpu.vendor
property by @un-def in #1570 - [Bugfix] Fix global admin restricted by manager role by @r4victor in #1592
- [Bugfix] Fixed defect with incorrect setting project role in the UI by @olgenn in #1593
- [Internal] Order project members by @r4victor in #1594
- [Feature] Introduce default permissions #1559 by @olgenn in #1567
- [Bugfix] Abort provisioning fleet when parsing ssh key fails(#1442) by @swsvc in #1589
- [Feature] Add LogStorage interface, CloudWatch Logs impl by @un-def in #1597
- [Docs] Document AMD support on RunPod by @peterschmidt85 in #1598
New contributors
Full changelog: 0.18.10...0.18.11rc1
0.18.10
Control plane UI
As a user, you most likely access dstack
using its CLI. At the same time, the dstack
server hosts a control plane that offers a wide range of functionality. It orchestrates cloud infrastructure, manages the state of resources, checks access, and much more.
Previously, managing projects and users was only possible via the API. The latest dstack
update introduces a full-fledged web-based user interface, which you can now access on the same port where the server is hosted.

The user interface allows you to configure projects, users, their permissions, manage resources and workloads, and much more.
To learn more about how to manage projects, users, and their permissions, check out the Projects page.
Environment variables interpolation
Previously, it wasn't possible to use environment variables to configure credentials for a private Docker registry. With this update, you can now use the following interpolation syntax to avoid hardcoding credentials in the configuration.
type: dev-environment
name: train
env:
- DOCKER_USER
- DOCKER_USERPASSWORD
image: dstackai/base:py3.10-0.4-cuda-12.1
registry_auth:
username: ${{ env.DOCKER_USER }}
password: ${{ env.DOCKER_USERPASSWORD }}
Network interfaces for port forwarding
When you run a dev environment or a task with dstack apply
, it automatically forwards the remote ports to localhost. However, these ports are, by default, bound to 127.0.0.1
. If you'd like to make a port available on an arbitrary host, you can now specify the host using the --host
option.
For example, this command will make the port available on all network interfaces:
dstack apply --host 0.0.0.0 -f my-task.dstack.yml
Full changelog
- [Feature] Add
--host HOST
arg todstack apply
command by @un-def in #1531 - [Feature] Interpolate env in registry_auth by @r4victor in #1540
- [Bugfix] Ensure
dstack
CLI exits with non-zero exit code on errors by @r4victor in #1529 - [Bugfix] Fix
http
services running on 443 in the logs by @r4victor in #1522 - [Bugfix] Forece the use of the
root
user in custom Docker images by @jvstme in #1538 - [Bugfix] Update Docker to 27.1.1 in
dstack
VM images by @jvstme in #1536 - [Feature] Add control plane UI by @olgenn in #1524
- [Docs] Document the
nvcc
property by @peterschmidt85 in #1526 - [Docs] Document
env
for on-prem fleets #1527 by @peterschmidt85 in #1530 - [Interna] Fix unlocking on transaction rollback by @r4victor in #1537
- [Internal] Bump base
dstack
image version to0.5
by @jvstme in #1541
All changes: 0.18.9...0.18.10