Google Cloud Storage Integration #683

anishxyz · 2025-02-03T22:04:14Z

Pull Request Summary

What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

yixu34 · 2025-02-03T22:44:14Z

model-engine/model_engine_server/infra/gateways/gcs_file_storage_gateway.py

+from model_engine_server.infra.gateways.gcs_filesystem_gateway import GCSFilesystemGateway
+
+
+def get_gcs_key(owner: str, file_id: str) -> str:


nit: I'd prefix these w/ an underscore so that no one is tempted to try and import these from outside this file, thus breaking Clean Architecture norms.

btw I think the s3_file_storage_gateway also doesn't have the prefixes

yixu34 · 2025-02-03T22:47:47Z

model-engine/model_engine_server/infra/gateways/gcs_file_storage_gateway.py

+        """
+        try:
+            client = self.filesystem_gateway.get_storage_client({})
+            bucket = client.bucket(infra_config().gcs_bucket)


I know this pattern was already there, but I think it'd probably make more sense to pass in the bucket into the constructor of this class. This way, there's one less dependency on the old infra_config object. @seanshi-scale @tiffzhao5 thoughts?

Could also make the argument to just pass in the bucket as an argument with every get_file call, but that's outside of the scope of this change I'd say.

I'm fine with having the bucket be passed in the constructor (in addition to anything else from any configs); dependencies.py does read in from infra_config at times to figure out constructor arguments, so there's precedent already

yixu34 · 2025-02-03T22:52:33Z

model-engine/model_engine_server/infra/gateways/gcs_llm_artifact_gateway.py

+        blobs = bucket.list_blobs(prefix=prefix)
+        downloaded_files = []
+
+        for blob in blobs:


Looks like this is just sequentially downloading the files? Is this the recommended way? ChatGPT showed me two responses, (1) with a ThreadPoolExecutor, and (2) this one.

seanshi-scale · 2025-02-04T00:31:43Z

model-engine/model_engine_server/infra/gateways/gcs_file_storage_gateway.py

@@ -0,0 +1,104 @@
+import os


note: I think this is only really used for some fine tuning apis that aren't really used at this point, think it's fine to keep ofc since you'll probably need to initialize dependencies anyways, but this code probably won't really get exercised at all

seanshi-scale · 2025-02-04T00:39:42Z

model-engine/model_engine_server/infra/gateways/gcs_filesystem_gateway.py

+        Retrieve or create a Google Cloud Storage client. Could optionally
+        utilize environment variables or passed-in credentials.
+        """
+        project = kwargs.get("gcp_project", os.getenv("GCP_PROJECT"))


where does this env var get set? it seems analogous to AWS_PROFILE but those changes would need to be baked into any relevant k8s yamls most likely

seanshi-scale · 2025-02-04T00:42:49Z

model-engine/model_engine_server/infra/gateways/gcs_llm_artifact_gateway.py

+        files = [blob.name for blob in bucket.list_blobs(prefix=prefix)]
+        return files
+
+    def download_files(self, path: str, target_path: str, overwrite=False, **kwargs) -> List[str]:


IIUC with the current state of the code, this only gets called if you use TGI, LightLLM, TensorRT-LLM, Deepspeed as inference frameworks, so I doubt this code ends up getting exercised in practice (it's only used to download a tokenizer to count tokens on the Gateway)

seanshi-scale · 2025-02-04T01:10:39Z

model-engine/model_engine_server/infra/gateways/gcs_llm_artifact_gateway.py

+
+        for blob in blobs:
+            # Remove prefix and leading slash to derive local name
+            file_path_suffix = blob.name.replace(prefix, "").lstrip("/")


do you want to replace(prefix, "", count=1)? (or something like that) just in case the prefix appears elsewhere in the string

for that matter is that also a bug in the s3 implementation?

seanshi-scale

looks good, just a few things that looked like bugs (from the s3 implementation)

anishxyz · 2025-02-05T17:14:35Z

Hi! Sorry should have made this a draft pr but still need to test and work on it. I put it up mainly to coordinate with aagam

gcs gateways

441bf26

yixu34 reviewed Feb 3, 2025

View reviewed changes

seanshi-scale reviewed Feb 4, 2025

View reviewed changes

anishxyz added 2 commits February 4, 2025 11:58

gcs gateways

ee23100

gcs gateways

723fc21

AaDalal mentioned this pull request Feb 19, 2025

[GCP] Update artifact registry/db code #685

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Google Cloud Storage Integration #683

Google Cloud Storage Integration #683

Uh oh!

anishxyz commented Feb 3, 2025

Uh oh!

yixu34 Feb 3, 2025

Uh oh!

seanshi-scale Feb 4, 2025

Uh oh!

yixu34 Feb 3, 2025

Uh oh!

seanshi-scale Feb 4, 2025

Uh oh!

yixu34 Feb 3, 2025

Uh oh!

seanshi-scale Feb 4, 2025

Uh oh!

seanshi-scale Feb 4, 2025

Uh oh!

seanshi-scale Feb 4, 2025

Uh oh!

seanshi-scale Feb 4, 2025

Uh oh!

seanshi-scale left a comment

Uh oh!

anishxyz commented Feb 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

		from model_engine_server.infra.gateways.gcs_filesystem_gateway import GCSFilesystemGateway


		def get_gcs_key(owner: str, file_id: str) -> str:

Google Cloud Storage Integration #683

Are you sure you want to change the base?

Google Cloud Storage Integration #683

Uh oh!

Conversation

anishxyz commented Feb 3, 2025

Pull Request Summary

Test Plan and Usage Guide

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seanshi-scale left a comment

Choose a reason for hiding this comment

Uh oh!

anishxyz commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

anishxyz commented Feb 5, 2025 •

edited

Loading