Skip to content

Commit d8c9c75

Browse files
committed
feat(bedrock): add AWS SigV4 and STS web identity authentication
The Bedrock inference provider previously required a pre-signed bearer token (AWS_BEARER_TOKEN_BEDROCK). This PR adds full AWS credential chain support so Bedrock works natively in EKS/IRSA, GitHub Actions OIDC, EC2, ECS, and Lambda without managing long-lived credentials. When no api_key is configured, requests are signed using AWS SigV4 via botocore. STS role assumption and web identity federation are supported through RefreshableBotoSession, which refreshes credentials automatically. Bearer token mode is unchanged — if api_key is set in config or passed via x-llamastack-provider-data, it takes precedence. Also corrects the endpoint URL from bedrock-mantle to bedrock-runtime.<region>.amazonaws.com/openai/v1, and gates the bedrock model in ci-tests on AWS_DEFAULT_REGION (works for both bearer and SigV4 modes) instead of AWS_BEARER_TOKEN_BEDROCK. Closes #4730 Signed-off-by: skamenan7 <skamenan@redhat.com>
1 parent d9de9e3 commit d8c9c75

92 files changed

Lines changed: 1836 additions & 268 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/docs/api-openai/provider_matrix.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Models, endpoints, and versions used during test recordings.
3232
| Provider | Model(s) | Endpoint | Version Info |
3333
|----------|----------|----------|--------------|
3434
| azure | gpt-4o | llama-stack-test.openai.azure.com, lls-test.openai.azure.com | openai sdk: 2.5.0 |
35-
| bedrock | openai.gpt-oss-20b | bedrock-mantle.us-east-2.api.aws | openai sdk: 2.5.0 |
35+
| bedrock | openai.gpt-oss-20b | bedrock-runtime.us-east-2.amazonaws.com | openai sdk: 2.5.0 |
3636
| openai | gpt-4o, o4-mini, text-embedding-3-small | api.openai.com | openai sdk: 2.5.0 |
3737
| vllm | Qwen/Qwen3-0.6B |||
3838
| watsonx | meta-llama/llama-3-3-70b-instruct | us-south.ml.cloud.ibm.com | openai sdk: 2.5.0 |

docs/docs/providers/inference/remote_bedrock.mdx

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,11 +34,25 @@ AWS Bedrock inference provider using OpenAI compatible endpoint.
3434
| `network.timeout.connect` | `float \| None` | No | | Connection timeout in seconds. |
3535
| `network.timeout.read` | `float \| None` | No | | Read timeout in seconds. |
3636
| `network.headers` | `dict[str, str] \| None` | No | | Additional HTTP headers to include in all requests. |
37-
| `region_name` | `str` | No | us-east-2 | AWS Region for the Bedrock Runtime endpoint |
37+
| `aws_access_key_id` | `SecretStr \| None` | No | | The AWS access key to use. Default use environment variable: AWS_ACCESS_KEY_ID |
38+
| `aws_secret_access_key` | `SecretStr \| None` | No | | The AWS secret access key to use. Default use environment variable: AWS_SECRET_ACCESS_KEY |
39+
| `aws_session_token` | `SecretStr \| None` | No | | The AWS session token to use. Default use environment variable: AWS_SESSION_TOKEN |
40+
| `aws_role_arn` | `str \| None` | No | | The AWS role ARN to assume. Default use environment variable: AWS_ROLE_ARN |
41+
| `aws_web_identity_token_file` | `str \| None` | No | | The path to the web identity token file. Default use environment variable: AWS_WEB_IDENTITY_TOKEN_FILE |
42+
| `aws_role_session_name` | `str \| None` | No | | The session name to use when assuming a role. Default use environment variable: AWS_ROLE_SESSION_NAME |
43+
| `region_name` | `str \| None` | No | us-east-2 | AWS Region for the Bedrock Runtime endpoint |
44+
| `profile_name` | `str \| None` | No | | The profile name that contains credentials to use.Default use environment variable: AWS_PROFILE |
45+
| `total_max_attempts` | `int \| None` | No | | An integer representing the maximum number of attempts that will be made for a single request, including the initial attempt. Default use environment variable: AWS_MAX_ATTEMPTS |
46+
| `retry_mode` | `str \| None` | No | | A string representing the type of retries Boto3 will perform.Default use environment variable: AWS_RETRY_MODE |
47+
| `connect_timeout` | `float \| None` | No | 60.0 | The time in seconds till a timeout exception is thrown when attempting to make a connection. The default is 60 seconds. |
48+
| `read_timeout` | `float \| None` | No | 60.0 | The time in seconds till a timeout exception is thrown when attempting to read from a connection.The default is 60 seconds. |
49+
| `session_ttl` | `int \| None` | No | 3600 | The time in seconds till a session expires. The default is 3600 seconds (1 hour). |
3850

3951
## Sample Configuration
4052

4153
```yaml
4254
api_key: ${env.AWS_BEARER_TOKEN_BEDROCK:=}
4355
region_name: ${env.AWS_DEFAULT_REGION:=us-east-2}
56+
aws_role_arn: ${env.AWS_ROLE_ARN:=}
57+
aws_web_identity_token_file: ${env.AWS_WEB_IDENTITY_TOKEN_FILE:=}
4458
```

docs/docs/providers/safety/remote_bedrock.mdx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@ AWS Bedrock safety provider for content moderation using AWS's safety services.
3636
| `aws_access_key_id` | `SecretStr \| None` | No | | The AWS access key to use. Default use environment variable: AWS_ACCESS_KEY_ID |
3737
| `aws_secret_access_key` | `SecretStr \| None` | No | | The AWS secret access key to use. Default use environment variable: AWS_SECRET_ACCESS_KEY |
3838
| `aws_session_token` | `SecretStr \| None` | No | | The AWS session token to use. Default use environment variable: AWS_SESSION_TOKEN |
39+
| `aws_role_arn` | `str \| None` | No | | The AWS role ARN to assume. Default use environment variable: AWS_ROLE_ARN |
40+
| `aws_web_identity_token_file` | `str \| None` | No | | The path to the web identity token file. Default use environment variable: AWS_WEB_IDENTITY_TOKEN_FILE |
41+
| `aws_role_session_name` | `str \| None` | No | | The session name to use when assuming a role. Default use environment variable: AWS_ROLE_SESSION_NAME |
3942
| `region_name` | `str \| None` | No | | The default AWS Region to use, for example, us-west-1 or us-west-2.Default use environment variable: AWS_DEFAULT_REGION |
4043
| `profile_name` | `str \| None` | No | | The profile name that contains credentials to use.Default use environment variable: AWS_PROFILE |
4144
| `total_max_attempts` | `int \| None` | No | | An integer representing the maximum number of attempts that will be made for a single request, including the initial attempt. Default use environment variable: AWS_MAX_ATTEMPTS |

src/llama_stack/core/request_headers.py

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,16 @@
77
import contextvars
88
import json
99
from contextlib import AbstractContextManager
10-
from typing import Any, cast
10+
from typing import TYPE_CHECKING, Any, cast
1111

1212
from llama_stack.core.datatypes import User
1313
from llama_stack.log import get_logger
1414

1515
from .utils.dynamic import instantiate_class_type
1616

17+
if TYPE_CHECKING:
18+
from llama_stack_api import ProviderSpec
19+
1720
log = get_logger(name=__name__, category="core")
1821

1922
# Context variable for request provider data and auth attributes
@@ -24,6 +27,9 @@ class RequestProviderDataContext(AbstractContextManager[None]):
2427
"""Context manager for request provider data"""
2528

2629
def __init__(self, provider_data: dict[str, Any] | None = None, user: User | None = None) -> None:
30+
if provider_data is not None and not isinstance(provider_data, dict):
31+
log.error("Provider data must be a JSON object")
32+
provider_data = None
2733
self.provider_data = provider_data or {}
2834
if user:
2935
self.provider_data["__authenticated_user"] = user
@@ -43,6 +49,8 @@ def __exit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
4349
class NeedsRequestProviderData:
4450
"""Mixin for providers that require per-request provider data from request headers."""
4551

52+
__provider_spec__: "ProviderSpec"
53+
4654
def get_request_provider_data(self) -> Any:
4755
spec = getattr(self, "__provider_spec__", None)
4856
if not spec:
@@ -82,11 +90,20 @@ def parse_request_provider_data(headers: dict[str, str]) -> dict[str, Any] | Non
8290
return None
8391

8492
try:
85-
return cast(dict[str, Any], json.loads(val))
93+
parsed = json.loads(val)
8694
except json.JSONDecodeError:
8795
log.error("Provider data not encoded as a JSON object!")
8896
return None
8997

98+
if parsed is None:
99+
return None
100+
101+
if not isinstance(parsed, dict):
102+
log.error("Provider data must be encoded as a JSON object")
103+
return None
104+
105+
return cast(dict[str, Any], parsed)
106+
90107

91108
def request_provider_data_context(headers: dict[str, str], user: User | None = None) -> AbstractContextManager[None]:
92109
"""Context manager that sets request provider data from headers and user for the duration of the context"""

src/llama_stack/distributions/ci-tests/ci_tests.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,11 @@ def get_distribution_template() -> DistributionTemplate:
5252

5353
# Bedrock model must be pre-registered because the recording system cannot
5454
# replay model-list discovery calls against the Bedrock endpoint in CI.
55+
# Gate on AWS_DEFAULT_REGION (required for both bearer-token and SigV4 modes)
56+
# rather than AWS_BEARER_TOKEN_BEDROCK so the model registers in OIDC/IRSA CI too.
5557
bedrock_model = ModelInput(
5658
model_id="bedrock/openai.gpt-oss-20b",
57-
provider_id="${env.AWS_BEARER_TOKEN_BEDROCK:+bedrock}",
59+
provider_id="${env.AWS_DEFAULT_REGION:+bedrock}",
5860
provider_model_id="openai.gpt-oss-20b",
5961
model_type=ModelType.llm,
6062
)

src/llama_stack/distributions/ci-tests/config.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ providers:
4747
config:
4848
api_key: ${env.AWS_BEARER_TOKEN_BEDROCK:=}
4949
region_name: ${env.AWS_DEFAULT_REGION:=us-east-2}
50+
aws_role_arn: ${env.AWS_ROLE_ARN:=}
51+
aws_web_identity_token_file: ${env.AWS_WEB_IDENTITY_TOKEN_FILE:=}
5052
- provider_id: ${env.NVIDIA_API_KEY:+nvidia}
5153
provider_type: remote::nvidia
5254
config:
@@ -301,7 +303,7 @@ registered_resources:
301303
model_type: llm
302304
- metadata: {}
303305
model_id: bedrock/openai.gpt-oss-20b
304-
provider_id: ${env.AWS_BEARER_TOKEN_BEDROCK:+bedrock}
306+
provider_id: ${env.AWS_DEFAULT_REGION:+bedrock}
305307
provider_model_id: openai.gpt-oss-20b
306308
model_type: llm
307309
shields:

src/llama_stack/distributions/ci-tests/run-with-postgres-store.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ providers:
4747
config:
4848
api_key: ${env.AWS_BEARER_TOKEN_BEDROCK:=}
4949
region_name: ${env.AWS_DEFAULT_REGION:=us-east-2}
50+
aws_role_arn: ${env.AWS_ROLE_ARN:=}
51+
aws_web_identity_token_file: ${env.AWS_WEB_IDENTITY_TOKEN_FILE:=}
5052
- provider_id: ${env.NVIDIA_API_KEY:+nvidia}
5153
provider_type: remote::nvidia
5254
config:
@@ -314,7 +316,7 @@ registered_resources:
314316
model_type: llm
315317
- metadata: {}
316318
model_id: bedrock/openai.gpt-oss-20b
317-
provider_id: ${env.AWS_BEARER_TOKEN_BEDROCK:+bedrock}
319+
provider_id: ${env.AWS_DEFAULT_REGION:+bedrock}
318320
provider_model_id: openai.gpt-oss-20b
319321
model_type: llm
320322
shields:

src/llama_stack/distributions/starter/config.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ providers:
4747
config:
4848
api_key: ${env.AWS_BEARER_TOKEN_BEDROCK:=}
4949
region_name: ${env.AWS_DEFAULT_REGION:=us-east-2}
50+
aws_role_arn: ${env.AWS_ROLE_ARN:=}
51+
aws_web_identity_token_file: ${env.AWS_WEB_IDENTITY_TOKEN_FILE:=}
5052
- provider_id: ${env.NVIDIA_API_KEY:+nvidia}
5153
provider_type: remote::nvidia
5254
config:

src/llama_stack/distributions/starter/run-with-postgres-store.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ providers:
4747
config:
4848
api_key: ${env.AWS_BEARER_TOKEN_BEDROCK:=}
4949
region_name: ${env.AWS_DEFAULT_REGION:=us-east-2}
50+
aws_role_arn: ${env.AWS_ROLE_ARN:=}
51+
aws_web_identity_token_file: ${env.AWS_WEB_IDENTITY_TOKEN_FILE:=}
5052
- provider_id: ${env.NVIDIA_API_KEY:+nvidia}
5153
provider_type: remote::nvidia
5254
config:

src/llama_stack/providers/registry/inference.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ def available_providers() -> list[ProviderSpec]:
121121
api=Api.inference,
122122
adapter_type="bedrock",
123123
provider_type="remote::bedrock",
124-
pip_packages=[],
124+
pip_packages=["boto3"],
125125
module="llama_stack.providers.remote.inference.bedrock",
126126
config_class="llama_stack.providers.remote.inference.bedrock.BedrockConfig",
127127
provider_data_validator="llama_stack.providers.remote.inference.bedrock.config.BedrockProviderDataValidator",

0 commit comments

Comments
 (0)