Releases: lablup/backend.ai
19.09.0
Key highlights
- Custom image import API which can automatically convert existing Python-based Docker images into a runnable Backend.AI kernel images.
- Batch jobs which execute the given startup command immediately after session creation and terminated immediately once done, with an explicit record of success or failure depending on the command's exit code.
- High availability support by running multiple manager instances.
- Job queueing which allows submission of session creation requests even when the resources in the cluster are fully utilized, and automatically starts the oldest pending requests whenever the required amount of resources becomes available.
- Event monitoring API using HTML5 Server-Sent Events protocol to allow clients to get kernel lifecycle notification without excessive polling.
- 3-level user privileges: super-admin, domain-admin, and users
- Customizable new user signup process
- Authentication support for etcd
- SSH keypairs fixed for user keypairs which are auto-installed into their sessions
- Support for integration with Harbor docker registries
There are many more changes and fixes.
Please refer the per-component changelogs in their repisotories:
19.03.0
Key highlights
- This is the first version to support a usable web GUI via the console project.
- Integration with NGC (NVIDIA GPU Cloud) images
- Per-keypair resource policies
- Support for authentication with Redis
- Resource presets
- Multiple vfolder hosts to utilize multiple volume mounts
- Various clean ups related to resource slot definitions and its operation semantics, including renaming of "gpu" slots into "cuda.shares" and "cuda.device"
There are many more changes and fixes.
Please refer the per-component changelogs in their repisotories:
18.12.0
Key highlights
- Service ports
- CORS support in the gateway API
- TPU plugin support
There are many more changes and fixes.
Please refer the per-component changelogs in their repisotories:
v1.4.0
Key highlights: Shared virtual folders and multi-GPU scheduling
Manager
- Add a new set of virtual folder APIs to invite other user to my own vfolder and list/accept invitations from other users. (lablup/backend.ai-manager#80)
- Improve existing APIs to stream downloads/uploads of virtual folder files and explicit option to recursively delete a directory (lablup/backend.ai-manager#89, lablup/backend.ai-manager#70)
- Add a new kernel API to list files in the session container (lablup/backend.ai-manager#63)
- All API endpoints are now available without version prefixes (e.g.,
/v2/
) and in the future only this will be supported. (lablup/backend.ai-manager#78) - The
user_id
field of the keypairs database table is now string instead of integer. You need to provide a manualuser_id_map.txt
mapping file to run the database schema upgrade using alembic. - Upgrade to aiohttp v3.4 series.
Agent
- Add support for multi-GPU scheduling, where you can allocate multiples of GPU shares to compute sessions so that they can access multiple GPUs. The agent's decimal-based "share" model supports fractional allocations as well, but currently fractional CUDA GPU sharing is highly experimental and only provided to private testers. (lablup/backend.ai-agent#66)
- Introduces an initial version of accelerator plugins. Currently there is only one plugin: CUDA accelerator. Now you can easily turn on/off CUDA GPU supports by installing/uninstalling this plugin. (lablup/backend.ai-agent#66)
- Add support for nvidia-docker v2. (lablup/backend.ai-agent#64)
- Agent restarts now completely preserves the kernel session states. (lablup/backend.ai-agent#35, lablup/backend.ai-agent#73)
- You may limit the view of agents against available system resources such as CPU cores and GPU devices using a hexademical mask for benchmarks and multi-GPU debugging. (lablup/backend.ai-agent#65)
- Stability improvements including that it does no longer retry to kill already terminated kernel containers but report them as "terminated", preventing an infinite loop of kernel creation failures in certain usage scenarios.
- Improve inner beauty for future support of non-dockerized environments.
Client for Python (v1.4)
- Add support for new
vfolder
subcommands to invite and accept invitation of shared virtual folders. - Add support for listing and downloading vfolder files.
- Now client library users should wrap the API function invocation codes with an explicit session like aiohttp's client APIs. (example)
- Upgrade to aiohttp v3.4 series.
v1.3.0
Key highlight: Improve dockerization support and add a plugin architecture for future extension
Manager
- Now the Backend.AI gateway uses a modular architecture where you can add 3rd-party extensions as
aiohttp.web.Application
and middlewares viaBACKEND_EXTENSIONS
environment variable. (lablup/backend.ai-manager#65) - Adopt aiojobs as the main coroutine task scheduler. (lablup/backend.ai-manager#65)
This improves handler/task cancellation as well. - Public non-authorized APIs become accessible without "Date" HTTP header set. (lablup/backend.ai-manager#65)
- Upgrade aiohttp to v3.0 release. (lablup/backend.ai-manager#64)
- Improve dockerization support. (lablup/backend.ai-manager#62, #15)
- Fix "X-Method-Override" support that was interfering with RFC-7807-style error reporting.
Agent
- Fix repeating docker event polling even when there is connection/client-side aiohttp errors.
- Upgrade aiohttp to v3.0 release.
- Improve dockerization. (lablup/backend.ai-agent#55)
- Improve inner beauty.
Client for Python (v1.2.1)
- Improve exception handling (use
Exception
instead ofBaseException
as the base class forBackendError
) - Upgrade aiohttp to v3.0 release.
- Fix silent swallowing of
asyncio.CancelledError
andasyncio.TimeoutError
- Allow uploading multiple files to a virtual folder in a single command (
backend.ai vfolder upload
)
v1.2.0
Key highlight: Improved logging and batch-mode interactions
NOTICE
-
Now we have the official documentation for installation guide!
Check out it here! -
From this release, the manager and agent versions will go together, which indicates
the compatibility of them, even when either one has relatively little improvements.
Manager and Agent
-
The gateway server now consider per-agent image availability when scheduling a new
kernel. (lablup/backend.ai-manager#29) -
The execute API now returns the exit code value of underlying in-kernel
subprocesses in the batch mode. (lablup/backend.ai-manager#60) -
The API gateway server is now fully horizontally-scalable across multiple cores and
multiple servers. -
Improve logging: it now provides multiprocess-safe file-based rotating logs.
(lablup/backend.ai-manager#10)
Manager
- Fix the Admin API error when filtering agents by their status due to a missing
method parameter inAgent.batch_load()
.
Agent
-
Remove the image name prefix when reporting available images. (lablup/backend.ai-agent#51)
-
Improve debug-kernel mode to mount host-side kernel runner source into the kernel
containers so that they use the latest, editable source clone of the kernel runner.
Client (v1.1.5 to v1.1.7)
-
Apply authentication to websocket-based API requests.
-
Fix a bug in client-side validation of user-provided session ID token.
-
Add missing
ai.backend.client.cli.admin
module in the distributed package.