Releases · lablup/backend.ai

22 Sep 07:08

achimnol

19.09.0

751c647

19.09.0

Key highlights

Custom image import API which can automatically convert existing Python-based Docker images into a runnable Backend.AI kernel images.
Batch jobs which execute the given startup command immediately after session creation and terminated immediately once done, with an explicit record of success or failure depending on the command's exit code.
High availability support by running multiple manager instances.
Job queueing which allows submission of session creation requests even when the resources in the cluster are fully utilized, and automatically starts the oldest pending requests whenever the required amount of resources becomes available.
Event monitoring API using HTML5 Server-Sent Events protocol to allow clients to get kernel lifecycle notification without excessive polling.
3-level user privileges: super-admin, domain-admin, and users
Customizable new user signup process
Authentication support for etcd
SSH keypairs fixed for user keypairs which are auto-installed into their sessions
Support for integration with Harbor docker registries

There are many more changes and fixes.
Please refer the per-component changelogs in their repisotories:

Assets 2

22 Sep 07:15

achimnol

19.03.0

1b9453b

19.03.0

Key highlights

This is the first version to support a usable web GUI via the console project.
Integration with NGC (NVIDIA GPU Cloud) images
Per-keypair resource policies
Support for authentication with Redis
Resource presets
Multiple vfolder hosts to utilize multiple volume mounts
Various clean ups related to resource slot definitions and its operation semantics, including renaming of "gpu" slots into "cuda.shares" and "cuda.device"

There are many more changes and fixes.
Please refer the per-component changelogs in their repisotories:

Assets 2

22 Sep 07:11

achimnol

18.12.0

6a68871

18.12.0

Key highlights

Service ports
CORS support in the gateway API
TPU plugin support

There are many more changes and fixes.
Please refer the per-component changelogs in their repisotories:

Assets 2

02 Oct 14:17

achimnol

1.4.0

bd1bfca

v1.4.0

Key highlights: Shared virtual folders and multi-GPU scheduling

Manager

Add a new set of virtual folder APIs to invite other user to my own vfolder and list/accept invitations from other users. (lablup/backend.ai-manager#80)
Improve existing APIs to stream downloads/uploads of virtual folder files and explicit option to recursively delete a directory (lablup/backend.ai-manager#89, lablup/backend.ai-manager#70)
Add a new kernel API to list files in the session container (lablup/backend.ai-manager#63)
All API endpoints are now available without version prefixes (e.g., /v2/) and in the future only this will be supported. (lablup/backend.ai-manager#78)
The user_id field of the keypairs database table is now string instead of integer. You need to provide a manual user_id_map.txt mapping file to run the database schema upgrade using alembic.
Upgrade to aiohttp v3.4 series.

Agent

Add support for multi-GPU scheduling, where you can allocate multiples of GPU shares to compute sessions so that they can access multiple GPUs. The agent's decimal-based "share" model supports fractional allocations as well, but currently fractional CUDA GPU sharing is highly experimental and only provided to private testers. (lablup/backend.ai-agent#66)
Introduces an initial version of accelerator plugins. Currently there is only one plugin: CUDA accelerator. Now you can easily turn on/off CUDA GPU supports by installing/uninstalling this plugin. (lablup/backend.ai-agent#66)
Add support for nvidia-docker v2. (lablup/backend.ai-agent#64)
Agent restarts now completely preserves the kernel session states. (lablup/backend.ai-agent#35, lablup/backend.ai-agent#73)
You may limit the view of agents against available system resources such as CPU cores and GPU devices using a hexademical mask for benchmarks and multi-GPU debugging. (lablup/backend.ai-agent#65)
Stability improvements including that it does no longer retry to kill already terminated kernel containers but report them as "terminated", preventing an infinite loop of kernel creation failures in certain usage scenarios.
Improve inner beauty for future support of non-dockerized environments.

Client for Python (v1.4)

Add support for new vfolder subcommands to invite and accept invitation of shared virtual folders.
Add support for listing and downloading vfolder files.
Now client library users should wrap the API function invocation codes with an explicit session like aiohttp's client APIs. (example)
Upgrade to aiohttp v3.4 series.

Assets 2

14 Mar 08:35

achimnol

1.3.0

0af0b4e

v1.3.0

Key highlight: Improve dockerization support and add a plugin architecture for future extension

Manager

Now the Backend.AI gateway uses a modular architecture where you can add 3rd-party extensions as aiohttp.web.Application and middlewares via BACKEND_EXTENSIONS environment variable. (lablup/backend.ai-manager#65)
Adopt aiojobs as the main coroutine task scheduler. (lablup/backend.ai-manager#65)
This improves handler/task cancellation as well.
Public non-authorized APIs become accessible without "Date" HTTP header set. (lablup/backend.ai-manager#65)
Upgrade aiohttp to v3.0 release. (lablup/backend.ai-manager#64)
Improve dockerization support. (lablup/backend.ai-manager#62, #15)
Fix "X-Method-Override" support that was interfering with RFC-7807-style error reporting.

Agent

Fix repeating docker event polling even when there is connection/client-side aiohttp errors.
Upgrade aiohttp to v3.0 release.
Improve dockerization. (lablup/backend.ai-agent#55)
Improve inner beauty.

Client for Python (v1.2.1)

Improve exception handling (use Exception instead of BaseException as the base class for BackendError)
Upgrade aiohttp to v3.0 release.
Fix silent swallowing of asyncio.CancelledError and asyncio.TimeoutError
Allow uploading multiple files to a virtual folder in a single command (backend.ai vfolder upload)

Assets 2

30 Jan 02:55

achimnol

1.2.0

5a2f626

v1.2.0

Key highlight: Improved logging and batch-mode interactions

NOTICE

Now we have the official documentation for installation guide!
Check out it here!
From this release, the manager and agent versions will go together, which indicates
the compatibility of them, even when either one has relatively little improvements.

Manager and Agent

The gateway server now consider per-agent image availability when scheduling a new
kernel. (lablup/backend.ai-manager#29)
The execute API now returns the exit code value of underlying in-kernel
subprocesses in the batch mode. (lablup/backend.ai-manager#60)
The API gateway server is now fully horizontally-scalable across multiple cores and
multiple servers.
Improve logging: it now provides multiprocess-safe file-based rotating logs.
(lablup/backend.ai-manager#10)

Manager

Fix the Admin API error when filtering agents by their status due to a missing
method parameter in Agent.batch_load().

Agent

Remove the image name prefix when reporting available images. (lablup/backend.ai-agent#51)
Improve debug-kernel mode to mount host-side kernel runner source into the kernel
containers so that they use the latest, editable source clone of the kernel runner.

Client (v1.1.5 to v1.1.7)

Apply authentication to websocket-based API requests.
Fix a bug in client-side validation of user-provided session ID token.
Add missing ai.backend.client.cli.admin module in the distributed package.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: lablup/backend.ai

19.09.0

19.03.0

18.12.0

v1.4.0

v1.3.0

v1.2.0