Skip to content

Commit

Permalink
WIP: Add PUT uploads to object storage client
Browse files Browse the repository at this point in the history
- DONE: can object metadata like content-type be set? -> yes, in headers
- DONE: Backblaze B2 ignores request for SSE-B2 server-side encryption
  if provided in presigned URL query params -> move to header
- DONE: signature incorrect if `X-Amz-Server-Side-Encryption` in header
  -> keys were out of order because dict key sorting is case-sensitive.
  Use httpx.Headers to lowercase the keys and avoid this issue.
- DONE: provide checksum
  - https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html:
    "Currently, Amazon S3 presigned URLs don't support using the
    following data-integrity checksum algorithms (CRC32, CRC32C, SHA-1,
    SHA-256) when you upload objects. To verify the integrity of your
    object after uploading, you can provide an MD5 digest of the object
    when you upload it with a presigned URL. For more information about
    object integrity, see Checking object integrity."
    https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html
  - `Content-MD5` header can be used.
  - DONE: signature incorrect if `Content-MD5` in header -> `PutObject`
    docs say, "The base64-encoded 128-bit MD5 digest of the message
    (without the headers) according to RFC 1864." Had to base64-encode
    the digest. The "Checking object integrity" docs neglect to mention
    the need for base64-encoding.
    https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html
    https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html
  - Does B2 support `Content-MD5`? It looks like B2 usually uses SHA-1
    -> Yes, looks like B2 will pick up `Content-MD5`.
- TODO: update tests for any uncovered or changed code
  • Loading branch information
br3ndonland committed Jan 22, 2024
1 parent 8492656 commit fc3e417
Show file tree
Hide file tree
Showing 3 changed files with 80 additions and 24 deletions.
10 changes: 5 additions & 5 deletions docs/cloud-object-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,16 @@ Dotenv files are commonly kept in [cloud object storage](https://en.wikipedia.or

#### Download

Downloads with `GET` can be authenticated by including AWS Signature Version 4 information either with [request headers](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html) or [query parameters](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html). fastenv uses query parameters to generate [presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html). The advantage to presigned URLs with query parameters is that URLs can be used on their own.

The download method generates a presigned URL, uses it to download file contents, and either saves the contents to a file or returns the contents as a string.

Downloads with `GET` can be authenticated by including AWS Signature Version 4 information either with [request headers](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html) or [query parameters](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html). fastenv uses query parameters to generate [presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html). The advantage to presigned URLs with query parameters is that URLs can be used on their own.

A related operation is [`head_object`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.head_object), which can be used to check if an object exists. The request is the same as a `GET`, except the [`HEAD` HTTP request method](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/HEAD) is used. fastenv does not provide an implementation of `head_object` at this time, but it could be considered in the future.

#### Upload

The upload method uploads source contents to an object storage bucket, selecting the appropriate upload strategy based on the cloud platform being used. Uploads can be done with either `POST` or `PUT`.

[Uploads with `POST`](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-UsingHTTPPOST.html) work differently than downloads with `GET`. A typical back-end engineer might ask, "Can't I just `POST` binary data to an API endpoint with a bearer token or something?" To which AWS might respond, "No, not really. Here's how you do it instead: pretend like you're submitting a web form." "What?"

Anyway, here's how it works:
Expand All @@ -56,11 +58,9 @@ Dotenv files are commonly kept in [cloud object storage](https://en.wikipedia.or
3. _Calculate a signature_. This step is basically the same as for query string auth. A signing key is derived with HMAC, and then used with the string to sign for another round of HMAC to calculate the signature.
4. _Add the signature to the HTTP request_. For `POST` uploads, the signature is provided with other required information as form data, rather than as URL query parameters. An advantage of this approach is that it can also be used for browser-based uploads, because the form data can be used to populate the fields of an HTML web form. There is some overlap between items in the `POST` policy and fields in the form data, but they are not exactly the same.

The S3 API does also support [uploads with HTTP `PUT` requests](https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html). fastenv does not use `PUT` requests at this time, but they could be considered in the future.

Backblaze uploads with `POST` are different, though there are [good reasons](https://www.backblaze.com/blog/design-thinking-b2-apis-the-hidden-costs-of-s3-compatibility/) for that (helps keep costs low). fastenv includes an implementation of the Backblaze B2 `POST` upload process.

The upload method uploads source contents to an object storage bucket, selecting the appropriate upload strategy based on the cloud platform being used.
[Uploads with `PUT` can use presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html). Object metadata, like `Content-Type`, can be supplied in request headers.

#### List

Expand Down
79 changes: 63 additions & 16 deletions fastenv/cloud/object_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,9 +189,10 @@ def generate_presigned_url(
bucket_path: os.PathLike[str] | str,
*,
expires: int = 3600,
headers: httpx.Headers | dict[str, str] | None = None,
service: str = "s3",
) -> httpx.URL:
"""Generate a presigned URL for downloads from S3-compatible object storage.
"""Generate a presigned URL for S3-compatible object storage.
Requests to S3-compatible object storage can be authenticated either with
request headers or query parameters. Presigned URLs use query parameters.
Expand All @@ -207,6 +208,11 @@ def generate_presigned_url(
`expires`: seconds until the URL expires. The default and maximum
expiration times are the same as the AWS CLI and Boto3.
`headers`: HTTP request headers (not including the default HTTP `host` header)
that will be included with the request. These headers may include additional
`x-amz-*` headers, such as `X-Amz-Server-Side-Encryption`, or other headers
such as `Content-Type` known to be accepted by the API operation.
`service`: cloud service for which to generate the presigned URL.
https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/presign.html
Expand All @@ -220,7 +226,7 @@ def generate_presigned_url(
raise ValueError("Expiration time must be between one second and one week.")
key = key if (key := str(bucket_path)).startswith("/") else f"/{key}"
params = self._set_presigned_url_query_params(
method, key, expires=expires, service=service
method, key, expires=expires, headers=headers, service=service
)
return httpx.URL(
scheme="https", host=self._config.bucket_host, path=key, params=params
Expand All @@ -232,6 +238,7 @@ def _set_presigned_url_query_params(
key: str,
*,
expires: int,
headers: httpx.Headers | dict[str, str] | None = None,
service: str = "s3",
payload_hash: str = "UNSIGNED-PAYLOAD",
) -> httpx.QueryParams:
Expand Down Expand Up @@ -271,13 +278,20 @@ def _set_presigned_url_query_params(
if self._config.session_token:
params["X-Amz-Security-Token"] = self._config.session_token
params["X-Amz-SignedHeaders"] = "host"
headers = {"host": self._config.bucket_host}
default_headers = {"host": self._config.bucket_host}
if headers:
signed_headers = httpx.Headers({**default_headers, **headers})
else:
signed_headers = httpx.Headers(default_headers)
params["X-Amz-SignedHeaders"] = (
";".join(keys) if len(keys := sorted(signed_headers)) > 1 else "host"
)
# 1. create canonical request
canonical_request = self._create_canonical_request(
method=method,
key=key,
params=params,
headers=headers,
headers=signed_headers,
payload_hash=payload_hash,
)
# 2. create string to sign
Expand All @@ -297,8 +311,8 @@ def _set_presigned_url_query_params(
def _create_canonical_request(
method: Literal["DELETE", "GET", "HEAD", "POST", "PUT"],
key: str,
params: dict[str, str],
headers: dict[str, str],
params: httpx.QueryParams | dict[str, str],
headers: httpx.Headers | dict[str, str],
payload_hash: str,
) -> str:
"""Create a canonical request for AWS Signature Version 4.
Expand All @@ -311,6 +325,7 @@ def _create_canonical_request(
canonical_uri = urllib.parse.quote(key if key.startswith("/") else f"/{key}")
canonical_query_params = httpx.QueryParams(params)
canonical_query_string = str(canonical_query_params)
headers = httpx.Headers(headers)
header_keys = sorted(headers)
canonical_headers = "".join(f"{key}:{headers[key]}\n" for key in header_keys)
signed_headers = ";".join(header_keys)
Expand Down Expand Up @@ -392,7 +407,9 @@ async def upload(
source: os.PathLike[str] | str | bytes = ".env",
*,
content_type: str = "text/plain",
method: Literal["POST", "PUT"] = "PUT",
server_side_encryption: Literal["AES256", None] = None,
specify_content_disposition: bool = True,
) -> httpx.Response | None:
"""Upload a file to cloud object storage.
Expand All @@ -407,16 +424,43 @@ async def upload(
See Backblaze for a list of supported content types.
https://www.backblaze.com/b2/docs/content-types.html
`server_side_encryption`: optional encryption algorithm to specify,
which the object storage platform will use to encrypt the file for storage.
`method`: HTTP method to use for upload. S3-compatible object storage accepts
uploads with HTTP PUT via the PutObject API and presigned URLs, or POST
with authentication information in form fields.
`server_side_encryption`: optional encryption algorithm to specify for
the object storage platform to use to encrypt the file for storage.
This method supports AES256 encryption with managed keys,
referred to as "SSE-B2" on Backblaze or "SSE-S3" on AWS S3.
https://www.backblaze.com/b2/docs/server_side_encryption.html
https://www.backblaze.com/docs/cloud-storage-server-side-encryption
https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSideEncryption.html
`specify_content_disposition`: the HTTP header `Content-Disposition` indicates
whether the content is expected to be displayed inline (in the browser) or
downloaded to a file (referred to as an "attachment"). Dotenv files are
typically downloaded instead of being displayed in the browser, so by default,
fastenv will add `Content-Disposition: attachment; filename="{filename}"`.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition
"""
try:
content, message = await self._encode_source(source)
if self._config.bucket_host.endswith(".backblazeb2.com"):
content_length = len(content)
if method == "PUT":
content_md5 = base64.b64encode(hashlib.md5(content).digest())
headers = httpx.Headers({b"Content-MD5": content_md5})
headers["Content-Length"] = str(content_length)
headers["Content-Type"] = content_type
if specify_content_disposition:
filename = str(bucket_path).split(sep="/")[-1]
content_disposition = f'attachment; filename="{filename}"'
headers["Content-Disposition"] = content_disposition
if server_side_encryption:
headers["X-Amz-Server-Side-Encryption"] = server_side_encryption
url = self.generate_presigned_url(
method, bucket_path, expires=30, headers=headers
)
response = await self._client.put(url, content=content, headers=headers)
elif self._config.bucket_host.endswith(".backblazeb2.com"):
response = await self.upload_to_backblaze_b2(
bucket_path,
content,
Expand All @@ -426,7 +470,7 @@ async def upload(
else:
url, data = self.generate_presigned_post(
bucket_path,
content_length=len(content),
content_length=content_length,
content_type=content_type,
expires=30,
server_side_encryption=server_side_encryption,
Expand Down Expand Up @@ -479,11 +523,11 @@ def generate_presigned_post(
See Backblaze for a list of supported content types.
https://www.backblaze.com/b2/docs/content-types.html
`server_side_encryption`: optional encryption algorithm to specify,
which the object storage platform will use to encrypt the file for storage.
`server_side_encryption`: optional encryption algorithm to specify for
the object storage platform to use to encrypt the file for storage.
This method supports AES256 encryption with managed keys,
referred to as "SSE-B2" on Backblaze or "SSE-S3" on AWS S3.
https://www.backblaze.com/b2/docs/server_side_encryption.html
https://www.backblaze.com/docs/cloud-storage-server-side-encryption
https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSideEncryption.html
`specify_content_disposition`: the HTTP header `Content-Disposition` indicates
Expand Down Expand Up @@ -776,7 +820,7 @@ async def get_backblaze_b2_upload_url(
"""Get an upload URL from Backblaze B2, using the authorization token
and URL obtained from a call to `b2_authorize_account`.
https://www.backblaze.com/b2/docs/uploading.html
https://www.backblaze.com/apidocs/b2-upload-file
https://www.backblaze.com/b2/docs/b2_get_upload_url.html
"""
authorization_response_json = authorization_response.json()
Expand All @@ -801,7 +845,10 @@ async def upload_to_backblaze_b2(
"""Upload a file to Backblaze B2 object storage, using the authorization token
and URL obtained from a call to `b2_get_upload_url`.
https://www.backblaze.com/b2/docs/uploading.html
Backblaze B2 does not currently support single-part uploads with POST
to their S3 API. The B2 native API must be used.
https://www.backblaze.com/apidocs/b2-upload-file
https://www.backblaze.com/b2/docs/b2_upload_file.html
"""
authorization_response = await self.authorize_backblaze_b2_account()
Expand Down
15 changes: 12 additions & 3 deletions tests/cloud/test_object_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -999,12 +999,14 @@ async def test_download_error(
assert "HTTPStatusError" in logger.error.call_args.args[0]

@pytest.mark.anyio
@pytest.mark.parametrize("method", ("POST", "PUT"))
@pytest.mark.parametrize("server_side_encryption", (None, "AES256"))
async def test_upload_from_file_with_object_storage_config(
self,
object_storage_config: fastenv.cloud.object_storage.ObjectStorageConfig,
object_storage_client_upload_prefix: str,
env_file: anyio.Path,
method: Literal["POST", "PUT"],
mocker: MockerFixture,
server_side_encryption: Literal["AES256", None],
) -> None:
Expand All @@ -1022,10 +1024,11 @@ async def test_upload_from_file_with_object_storage_config(
)
bucket_path = (
f"{object_storage_client_upload_prefix}/.env.from-file."
f"{object_storage_config.access_key}"
f"{object_storage_config.access_key}.{method.lower()}"
)
await object_storage_client.upload(
bucket_path=bucket_path,
method=method,
source=env_file,
server_side_encryption=server_side_encryption,
)
Expand All @@ -1035,12 +1038,14 @@ async def test_upload_from_file_with_object_storage_config(
)

@pytest.mark.anyio
@pytest.mark.parametrize("method", ("POST", "PUT"))
@pytest.mark.parametrize("server_side_encryption", (None, "AES256"))
async def test_upload_from_string_with_object_storage_config(
self,
object_storage_config: fastenv.cloud.object_storage.ObjectStorageConfig,
object_storage_client_upload_prefix: str,
env_str: str,
method: Literal["POST", "PUT"],
mocker: MockerFixture,
server_side_encryption: Literal["AES256", None],
) -> None:
Expand All @@ -1058,10 +1063,11 @@ async def test_upload_from_string_with_object_storage_config(
)
bucket_path = (
f"{object_storage_client_upload_prefix}/.env.from-string."
f"{object_storage_config.access_key}"
f"{object_storage_config.access_key}.{method.lower()}"
)
await object_storage_client.upload(
bucket_path=bucket_path,
method=method,
source=env_str,
server_side_encryption=server_side_encryption,
)
Expand All @@ -1071,12 +1077,14 @@ async def test_upload_from_string_with_object_storage_config(
)

@pytest.mark.anyio
@pytest.mark.parametrize("method", ("POST", "PUT"))
@pytest.mark.parametrize("server_side_encryption", (None, "AES256"))
async def test_upload_from_bytes_with_object_storage_config(
self,
object_storage_config: fastenv.cloud.object_storage.ObjectStorageConfig,
object_storage_client_upload_prefix: str,
env_bytes: bytes,
method: Literal["POST", "PUT"],
mocker: MockerFixture,
server_side_encryption: Literal["AES256", None],
) -> None:
Expand All @@ -1094,10 +1102,11 @@ async def test_upload_from_bytes_with_object_storage_config(
)
bucket_path = (
f"{object_storage_client_upload_prefix}/.env.from-bytes."
f"{object_storage_config.access_key}"
f"{object_storage_config.access_key}.{method.lower()}"
)
await object_storage_client.upload(
bucket_path=bucket_path,
method=method,
source=env_bytes,
server_side_encryption=server_side_encryption,
)
Expand Down

0 comments on commit fc3e417

Please sign in to comment.