Skip to content

Releases: llamastack/llama-stack

v0.7.0

01 Apr 20:52

Choose a tag to compare

What's Changed

  • fix: exclude informational checks from ci-status aggregation by @leseb in #5105
  • feat: add Responses API test coverage analyzer and conformance annotations by @leseb in #5101
  • refactor!: remove fine_tuning API by @leseb in #5104
  • fix!: remove duplicate dataset_id parameter in append-rows endpoint by @eoinfennessy in #4849
  • fix: Multi-worker cache synchronization for vector stores by @elinacse in #5076
  • feat: Add integration test for service_tier with openai client by @gyliu513 in #5103
  • feat: test responses API integration tests against Azure AI Foundry by @iamemilio in #5107
  • fix(security): add path traversal and header injection defenses by @rhdedgar in #5086
  • feat!: Part 2 - implement inline neural rerank for RAG by @r3v5 in #4877
  • feat: add provider compatibility matrix for Responses API by @leseb in #5113
  • perf: lazy-load braintrust autoevals to reduce idle memory (~63MB) by @leseb in #5078
  • feat: add provider version tracking to compatibility matrix by @leseb in #5115
  • perf: lazy-load torch in embedding_mixin to reduce startup memory by @leseb in #5116
  • perf: lazy-load torch and transformers in prompt_guard by @leseb in #5117
  • perf: lazy-load numpy, faiss, and sqlite_vec in vector_io providers by @leseb in #5118
  • fix(CI): reduce Mergify PR update frequency by @gyliu513 in #5106
  • feat: Add support for filters in PGVector and replace f-string usage in table name by @franciscojavierarceo in #5111
  • fix: bump pyjwt to 2.12.0 (CVE-2026-32597) by @eoinfennessy in #5127
  • fix(inference): improve chat completions OpenAI conformance by @cdoern in #5108
  • fix(storage): resolve asyncio event loop mismatch via operation deferral by @derekhiggins in #5130
  • fix(ci): use RELEASE_PAT and PRs in post-release workflow by @cdoern in #5132
  • chore: bump fallback_version to 0.6.1.dev0 by @cdoern in #5136
  • fix: remove UV_EXTRA_INDEX_URL from Release branch ci by @cdoern in #5138
  • fix(ci): add uv lock to post-release workflow to update stale lockfile by @cdoern in #5139
  • chore(github-deps): bump stainless-api/upload-openapi-spec-action from 1.11.6 to 1.13.0 by @dependabot[bot] in #5148
  • chore(github-deps): bump docker/setup-buildx-action from 3.12.0 to 4.0.0 by @dependabot[bot] in #5142
  • chore(github-deps): bump astral-sh/setup-uv from 7.3.1 to 7.5.0 by @dependabot[bot] in #5143
  • feat(blog): Agentic flows tutorial by @raghotham in #5035
  • chore(github-deps): bump docker/login-action from 3.7.0 to 4.0.0 by @dependabot[bot] in #5146
  • chore(github-deps): bump llamastack/llama-stack from ce063ac to 2157c09 by @dependabot[bot] in #5145
  • feat: Add OpenAI client integration test for top_logprobs by @gyliu513 in #5124
  • ci(mergify): skip conflict comments on stale PRs by @leseb in #5156
  • feat: Add stream_options parameter support by @gyliu513 in #4815
  • feat: promote connector API from v1alpha to v1beta by @leseb in #5129
  • refactor: replace LiteLLM with OpenAI mixin for WatsonX provider by @leseb in #5133
  • fix: optimize connector listing by @gyliu513 in #5164
  • feat: Add OpenAI client integration test for incomplete_details by @gyliu513 in #5157
  • refactor!: rename meta-reference providers to builtin by @leseb in #5131
  • feat!: eliminate /files/{file_id} GET differences by @r3v5 in #5154
  • feat: Add OpenAI client integration test for reasoning effort by @gyliu513 in #5170
  • fix: replace blocking requests calls with async httpx in remote providers by @gyliu513 in #5162
  • fix: remove references to defunct inline::builtin inference provider by @leseb in #5174
  • fix(vertexai): use SDK-native model names instead of stripping prefixes by @major in #5169
  • docs: add multi-tenant isolation example for conversations and responses by @jaideepr97 in #5176
  • fix: Remove duplicate decode by @gyliu513 in #5177
  • refactor: decouple file_search from legacy knowledge_search tool_groups by @leseb in #5175
  • feat: add configurable asyncpg connection pool settings by @iamemilio in #5160
  • chore: remove unused LiteLLMOpenAIMixin by @mattf in #5159
  • fix: Disable asyncpg OTel auto-instrumentation to prevent duplicate DB spans by @iamemilio in #5158
  • refactor!: rename knowledge_search to file_search across codebase by @leseb in #5186
  • fix: re-enable external provider module test by @cdoern in #5182
  • feat: add WatsonX Responses API integration test recordings by @leseb in #5120
  • feat: Add metrics for vector io by @gyliu513 in #5096
  • refactor: rename rag-runtime provider and builtin::rag toolgroup to file-search by @leseb in #5187
  • feat: auto-record integration tests on PRs with multi-provider support by @cdoern in #5123
  • fix: update recording workflow action SHAs to include skip-commit support by @cdoern in #5199
  • fix: support workflow_dispatch in commit-recordings via PR metadata artifact by @cdoern in #5202
  • fix: bump pyasn1 to 0.6.3 (CVE-2026-30922) by @eoinfennessy in #5207
  • docs: Add post about Responses API in Llama Stack by @jwm4 in #5196
  • fix: support fork PRs in commit-recordings workflow by @cdoern in #5204
  • fix: clean up artifacts before cloning fork PR branch by @cdoern in #5212
  • fix: handle both artifact structures for recordings copy by @cdoern in #5214
  • chore: rename bug template by @leseb in #5210
  • fix: only comment on PR when recordings are actually pushed by @cdoern in #5218
  • fix: prevent OTel context leak in fire-and-forget background tasks by @iamemilio in #5168
  • fix: provider_data_var context leak by @jaideepr97 in #5227
  • chore: Update formatting in CONTRIBUTING.md by @raghotham in #5231
  • chore(github-deps): bump actions/cache from 5.0.3 to 5.0.4 by @dependabot[bot] in #5241
  • chore(github-deps): bump actions/upload-artifact from 4.6.2 to 7.0.0 by @dependabot[bot] in #5235
  • chore(github-deps): bump docker/build-push-action from 6.19.2 to 7.0.0 by @dependabot[bot] in #5236
  • chore(github-deps): update llamastack/llama-stack requirement to 700b202 by @dependabot[bot] in #5239
  • chore(github-deps): bump docker/setup-qemu-action from 3.7.0 to 4.0.0 by @dependabot[bot] in #5234
  • feat!: BREAKING CHANGE: make sentence_transformers trust_remote_code configurable, default to False by @derekhiggins in #4602
  • docs: add architecture documentation and module-level READMEs by @leseb in #5213
  • refactor!: remove tool_groups from public API and auto-register from provider specs by @leseb in #4997
  • docs: add AGENTS.md with gui...
Read more

v0.6.1

30 Mar 13:09

Choose a tag to compare

What's Changed

Full Changelog: v0.6.0...v0.6.1

v0.6.0

11 Mar 15:01

Choose a tag to compare

What's Changed

  • chore: update convert_tooldef_to_openai_tool to match its usage by @mattf in #4837
  • feat!: improve consistency of post-training API endpoints by @eoinfennessy in #4606
  • fix: Arbitrary file write via a non-default configuration by @VaishnaviHire in #4844
  • chore: reduce uses of models.llama.datatypes by @mattf in #4847
  • docs: add technical release steps and improvements to RELEASE_PROCESS.md by @cdoern in #4792
  • chore: bump fallback version to 0.5.1 by @cdoern in #4846
  • fix: Exclude null 'strict' field in function tools to prevent OpenAI … by @gyliu513 in #4795
  • chore(test): add test to verify responses params make it to backend service by @mattf in #4850
  • chore: revert "fix: disable together banner (#4517)" by @mattf in #4856
  • fix: update together to work with latest api.together.xyz service (circa feb 2026) by @mattf in #4857
  • chore(github-deps): bump astral-sh/setup-uv from 7.2.0 to 7.3.0 by @dependabot[bot] in #4867
  • chore(github-deps): bump github/codeql-action from 4.32.0 to 4.32.2 by @dependabot[bot] in #4861
  • chore(github-deps): bump actions/cache from 5.0.2 to 5.0.3 by @dependabot[bot] in #4859
  • chore(github-deps): bump llamastack/llama-stack from 76bcb66 to c518b35 by @dependabot[bot] in #4858
  • fix(ci): ensure oasdiff is available for openai-coverage hook by @EleanorWho in #4835
  • fix: Deprecate items when create conversation by @gyliu513 in #4765
  • chore: refactor chunking to use configurable tiktoken encoding and document tokenizer limits by @mattf in #4870
  • chore: prune unused parts of models packages (checkpoint, tokenizer, prompt templates, datatypes) by @mattf in #4871
  • chore: prune unused utils from utils.memory.vector_store by @mattf in #4873
  • fix: Escape special characters in auto-generated provider documentati… by @gyliu513 in #4822
  • chore(docs): Use starter for opentelemetry integration test by @gyliu513 in #4875
  • fix: kvstore should call shutdown but not close by @gyliu513 in #4872
  • fix: uvicorn log ambiguity by @cdoern in #4522
  • chore(github-deps): bump actions/checkout from 4.2.2 to 6.0.2 by @dependabot[bot] in #4865
  • chore: cleanup mypy excludes by @mattf in #4876
  • feat: add integration test for max_output_tokens by @gyliu513 in #4825
  • chore(test): add test to verify responses params make it to backend s… by @gyliu513 in #4852
  • ci: add Docker image publishing to release workflow by @cdoern in #4882
  • feat: add ProcessFileRequest model to file_processors API by @alinaryan in #4885
  • docs: update responses api known limitations doc by @jaideepr97 in #4845
  • fix(vector_io): align Protocol signatures with request models by @skamenan7 in #4747
  • fix: add _ExceptionTranslatingRoute to prevent keep-alive breakage on Linux by @iamemilio in #4886
  • docs: add release notes for version 0.5 by @rhuss in #4855
  • fix(ci): disable uv cache cleanup when UV_NO_CACHE is set by @cdoern in #4889
  • feat: Add truncation parameter support by @gyliu513 in #4813
  • chore(ci): bump pinned action commit hashes in integration-tests.yml by @cdoern in #4895
  • docs: Add README for running observability test by @gyliu513 in #4884
  • fix: update rerank routing to match params by @mattf in #4900
  • feat: Add prompt_cache_key parameter support by @gyliu513 in #4775
  • chore: add rerank support to recorder by @mattf in #4903
  • feat: add rerank support to vllm inference provider by @mattf in #4902
  • fix(inference): use flat response message model for chat/completions by @cdoern in #4891
  • feat: add llama cpp server remote inference provider by @Bobbins228 in #4382
  • fix: Remove pillow as direct dependency by @VaishnaviHire in #4901
  • fix: pre-commit run -a by @mattf in #4907
  • fix(ci): Removed kotlin from preview builds by @gyliu513 in #4910
  • feat: Add service_tier parameter support by @gyliu513 in #4816
  • chore(github-deps): bump github/codeql-action from 4.32.2 to 4.32.3 by @dependabot[bot] in #4918
  • chore(github-deps): bump docker/login-action from 3.4.0 to 3.7.0 by @dependabot[bot] in #4916
  • chore(github-deps): bump llamastack/llama-stack from c7cdb40 to 4c1b03b by @dependabot[bot] in #4915
  • chore(github-deps): bump stainless-api/upload-openapi-spec-action from 1.10.0 to 1.11.6 by @dependabot[bot] in #4913
  • chore(github-deps): bump docker/build-push-action from 6.15.0 to 6.19.2 by @dependabot[bot] in #4912
  • fix(vertexai): raise descriptive error on auth failure instead of silent empty string by @major in #4909
  • fix: resolve StorageConfig default env vars at construction time by @major in #4897
  • feat: Add incomplete_details response property by @gyliu513 in #4812
  • feat(client-sdks): add OpenAPI Generator tooling by @aegeiger in #4874
  • fix(vector_io): eliminate duplicate call for vector store registration by @r3v5 in #4925
  • test(vertexai): add unit tests for VertexAI inference adapter by @major in #4927
  • feat: introduce new how-to blog by @cdoern in #4794
  • chore: remove reference to non-existent WeaviateRequestProviderData by @mattf in #4937
  • feat: standardized error types with HTTP status codes by @iamemilio in #4878
  • feat: add opentelemetry-distro to core dependencies by @Artemon-line in #4935
  • feat(ci): Add nightly job for doc build by @gyliu513 in #4911
  • fix: Ensure user isolation for stored conversations and responses by @jaideepr97 in #4834
  • fix: align chat completion usage schema with OpenAI spec by @cdoern in #4930
  • fix: allow conversation item type to be omitted by @mattf in #4948
  • feat: Enable inline PyPDF file_processors provider by @alinaryan in #4743
  • feat: add support for /responses background parameter by @cdoern in #4824
  • feat(vector_io): Implement Contextual Retrieval for improved RAG search quality by @r-bit-rry in #4750
  • chore: use SecretStr for x-llamastack-provider-data keys by @mattf in #4939
  • chore: remove unused vector store utils by @mattf in #4961
  • feat: auto-identify embedding models for vllm by @mattf in #4975
  • chore(github-deps): bump llamastack/llama-stack from 4c1b03b to 7d9786b by @dependabot[bot] in #4971
  • chore(github-deps): bump actions/checkout from 6.0.1 to 6.0.2 by @dependabot[bot] in #4969
  • chore(github-deps): bump actions/cache from 4.2.0 to 5.0.3 by @dependabot[bot] in #4963
  • chore(github-deps): bump github/codeql-action from 4.32.3 to 4.32.4 by @dependabot...
Read more

v0.5.2

06 Mar 13:21

Choose a tag to compare

What's Changed

  • chore: bump llama-stack-client to 0.5.1 by @cdoern in #4957
  • ci: add arm64 image manifest publishing to release workflow by @rhdedgar in #5006
  • feat(ci): automate post-release and pre-release version management (backport #4938) by @mergify[bot] in #5032
  • fix(llama-guard): less strict parsing of safety categories (backport #5045) by @mergify[bot] in #5053
  • fix: OCI26ai sql query patches (backport #5046) by @mergify[bot] in #5054

Full Changelog: v0.5.1...v0.5.2

v0.5.1

19 Feb 19:01
5708a71

Choose a tag to compare

What's Changed

  • fix: [release-0.5.x] Arbitrary file write via a non-default configuration (#4844) by @VaishnaviHire in #4869
  • fix(vertexai): raise descriptive error on auth failure instead of silent empty string (backport #4909) by @mergify[bot] in #4923
  • fix: resolve StorageConfig default env vars at construction time (backport #4897) by @mergify[bot] in #4924
  • feat: add opentelemetry-distro to core dependencies (backport #4935) by @mergify[bot] in #4943
  • fix(vector_io): eliminate duplicate call for vector store registration (backport #4925) by @mergify[bot] in #4941
  • chore: bump version to 0.5.1 for release by @cdoern in #4955

Full Changelog: v0.5.0...v0.5.1

v0.4.5

19 Feb 18:55
a381d21

Choose a tag to compare

What's Changed

  • chore: bump llama-stack-client to 0.4.4 in UI lockfile by @cdoern in #4791
  • fix: MCP CPU spike by using context manager for session cleanup by @derekhiggins in #4851
  • fix(vector_io): eliminate duplicate call for vector store registration (backport #4925) by @mergify[bot] in #4944
  • chore: bump version to 0.4.5 for release by @cdoern in #4954

Full Changelog: v0.4.4...v0.4.5

v0.5.0

05 Feb 17:20
1195931

Choose a tag to compare

What's Changed

  • docs: Added a new oci-llamastack notebook for how to build agents with OCI and llama stack by @omaryashraf5 in #4418
  • docs: Add guide to migrating from Agents to Responses by @jwm4 in #4375
  • feat: convert models API to use a FastAPI router by @nathan-weinberg in #4407
  • chore: update mcp dependency constraint to >=1.23.0 by @derekhiggins in #4457
  • feat(ci): added codeql scanning workflow by @gmatuz in #4462
  • feat: migrate Conversations API to FastAPI router by @leseb in #4342
  • fix(faiss): add backward compatibility for EmbeddedChunk deserialization by @leseb in #4463
  • fix: Removed duplicate parameters from integration test by @gyliu513 in #4461
  • fix: removed scan on push by @gmatuz in #4466
  • chore: Document release process by @raghotham in #4470
  • feat: build ARM64-based UBI starter image by @rhdedgar in #4474
  • chore: add "Discussion" issue template by @nathan-weinberg in #4469
  • chore: Updated test integration guide by @gyliu513 in #4460
  • fix: update CONTRIBUTING.md to reflect pre-commit version used in CI by @eoinfennessy in #4468
  • fix: skip resources with empty IDs from conditional env vars in config processing by @Elbehery in #4455
  • fix: Fix Vector Store Integration Tests by @franciscojavierarceo in #4472
  • chore: Delete CHANGELOG.md by @terrytangyuan in #4480
  • ci: run ARM64 builds on nightly schedule only by @rhdedgar in #4479
  • chore(github-deps): bump actions/checkout from 4.3.1 to 6.0.1 by @dependabot[bot] in #4491
  • chore(github-deps): bump astral-sh/setup-uv from 7.1.6 to 7.2.0 by @dependabot[bot] in #4490
  • chore(github-deps): bump docker/setup-qemu-action from 3.2.0 to 3.7.0 by @dependabot[bot] in #4489
  • chore(github-deps): bump github/codeql-action from 3.31.9 to 4.31.9 by @dependabot[bot] in #4488
  • chore(github-deps): bump stainless-api/upload-openapi-spec-action from 1.9.0 to 1.10.0 by @dependabot[bot] in #4487
  • chore: Add backwards compatibility for Milvus Chunks by @franciscojavierarceo in #4484
  • fix: aiohttp HTTP Parser auto_decompress feature susceptible to zip bomb by @leseb in #4494
  • chore: Add backwards compatibility for qdrant chunks by @Ygnas in #4495
  • chore: Updated CONTRIBUTING guidance for integration test by @gyliu513 in #4459
  • fix: fonttools security advisory by @leseb in #4503
  • chore: Add backwards compatibility for pgvector chunks by @Ygnas in #4506
  • refactor!: change image_name to distro_name in StackConfig by @cdoern in #4396
  • fix: Add backwards compatibility for sqlite-vec, chroma, and weaviate chunks by @ChristianZaccaria in #4502
  • fix: urllib3 vulnerable to decompression-bomb safeguard bypass by @leseb in #4512
  • fix: disable together banner by @cdoern in #4517
  • chore: switch to monthly minor release by @leseb in #4518
  • chore: change discussion template label by @nathan-weinberg in #4525
  • chore: add maintenance policy to release doc by @leseb in #4514
  • docs: fixed outdated links for api overview, routed to the updated links by @lalexandrh in #4524
  • chore: upgrade virtualenv by @raghotham in #4585
  • chore: resync client dep with main by @leseb in #4591
  • fix: llama-stack-api packaging by @cdoern in #4593
  • docs: add guidance for contributing new providers by @leseb in #4478
  • feat: migrate post_training API to FastAPI router by @eoinfennessy in #4496
  • fix(memory/rag): remove file:// uri prefix by @r-bit-rry in #4286
  • fix: benchmark registration via registered_resources config by @leseb in #4600
  • feat(api): migrate Eval API to FastAPI router (#4345) by @r-bit-rry in #4425
  • feat: convert shields API to use a FastAPI router by @nathan-weinberg in #4412
  • feat: convert datasetio API to use a FastAPI router by @nathan-weinberg in #4400
  • feat: Elasticsearch integration for VectorIO by @ezimuel in #4007
  • fix: enable vector store registration from config with OpenAI metadata by @are-ces in #4616
  • docs: Update RAG Agent Documentation using vector_stores by @robinnarsinghranabhat in #4485
  • chore(github-deps): bump github/codeql-action from 4.31.9 to 4.31.10 by @dependabot[bot] in #4640
  • chore(github-deps): bump actions/cache from 5.0.1 to 5.0.2 by @dependabot[bot] in #4639
  • chore(github-deps): bump docker/setup-buildx-action from 3.11.1 to 3.12.0 by @dependabot[bot] in #4638
  • chore(github-deps): bump actions/setup-node from 6.1.0 to 6.2.0 by @dependabot[bot] in #4637
  • fix: update responses limitations doc to track latest state by @iamemilio in #4392
  • feat: Convert scoring API to use a FastAPI router by @gyliu513 in #4521
  • fix: Removed unused para for test score by @gyliu513 in #4645
  • fix: fix list-deps quoting in deps-only output by @gyliu513 in #4653
  • feat(api): Implement connector support via static configuration by @jaideepr97 in #4263
  • fix: default ollama URL in Quickstart was incorrect in 2 places by @damian0815 in #4646
  • fix: unregister function first before register by @gyliu513 in #4473
  • feat!: migrate safety API to FastAPI router by @r-bit-rry in #4643
  • feat: convert prompts API to use a FastAPI router by @nathan-weinberg in #4649
  • feat: add scheduled CI workflow for release branches by @cdoern in #4510
  • fix: Fix redundant MCP tools/list calls by @jwm4 in #4634
  • docs: Move demo script to step 3 for quick start by @gyliu513 in #4661
  • feat: convert scoring_functions API to use FastAPI router. by @EleanorWho in #4599
  • feat(ci): add Bedrock integration tests with record/replay by @skamenan7 in #4292
  • feat: Core Changes for default embedding dims by @rriley99-oci in #4671
  • fix: use SecretStr for AWS credentials by @eoinfennessy in #4681
  • feat: Implemented reasoning.effort parameter in LLS Responses by @Nehanth in #4633
  • fix: file_search_call results missing document attributes/metadata by @are-ces in #4680
  • fix!: usage input_token_details and output_token_details are not optional by @mattf in #4690
  • fix: completed_at is required output by @mattf in #4692
  • fix: store is required output by @mattf in #4693
  • docs: update contrib guidelines on PR reviews by @leseb in #4676
  • docs: require test plan with script and output for API PRs by @leseb in #4659
  • feat: Add OpenAI API conformance coverage analyzer by @leseb in #4668
  • feat!: use global vertext API endpoint by @ktdreyer in #4674
  • fix: Concurrent calls into SentenceTransformer() cause failures of cl...
Read more

v0.4.4

30 Jan 16:25
3b50471

Choose a tag to compare

What's Changed

  • fix: Enable session polling during streaming responses (backport #4738) by @mergify[bot] in #4756
  • feat: add scheduled CI workflow for release branches (backport #4510) by @mergify[bot] in #4769
  • fix: make release-branch-scheduled-ci compatible with older branches (backport #4753) by @mergify[bot] in #4767
  • fix: pass branch explicitly to install-llama-stack-client action (backport #4759) by @mergify[bot] in #4763
  • fix: llama-stack-api packaging by @cdoern in #4777
  • feat(ci): unify PyPI/npm release workflow with dry-run support (backport #4774) by @mergify[bot] in #4785
  • fix: install setuptools-scm in CI (backport #4782) by @mergify[bot] in #4786
  • build: bump llama-stack-client to 0.4.4 for release by @cdoern in #4787
  • fix: override version from release tag for all packages (backport #4788) by @mergify[bot] in #4789

Full Changelog: v0.4.3...v0.4.4

v0.4.3

26 Jan 21:51

Choose a tag to compare

What's Changed

  • fix: enable vector store registration from config with OpenAI metadata (backport #4616) by @mergify[bot] in #4631
  • fix: Fix redundant MCP tools/list calls (backport #4634) by @mergify[bot] in #4663
  • fix: file_search_call results missing document attributes/metadata (backport #4680) by @mergify[bot] in #4686
  • fix: Concurrent calls into SentenceTransformer() cause failures of client.vector_stores.file_batches.create() (backport #4636) by @mergify[bot] in #4698
  • feat: Add shutdown functionality to LlamaStackAsLibraryClient and AsyncLlamaStackAsLibraryClient (backport #4642) by @mergify[bot] in #4733
  • feat(PGVector): implement automatic creation of vector extension during initialization of PGVectorVectorIOAdapter (backport #4660) by @mergify[bot] in #4740

Full Changelog: v0.4.2...v0.4.3

v0.4.2

16 Jan 14:44

Choose a tag to compare

What's Changed

Full Changelog: v0.4.1...v0.4.2