-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Get cloud provider with ray on kubernetes #51793
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: dayshah <[email protected]>
Signed-off-by: dayshah <[email protected]>
@@ -81,6 +81,7 @@ class ClusterConfigToReport: | |||
max_workers: Optional[int] = None | |||
head_node_instance_type: Optional[str] = None | |||
worker_node_instance_types: Optional[List[str]] = None | |||
cloud_provider_alt: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot change the schema here without changing the server since server does the schema validation. Lets discuss offline how to change the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with added field to UsageStatsToReport
is that all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and updated schema in test
import requests | ||
|
||
# Make internal metadata requests to all 3 clouds | ||
# The requests may be rejected based on pod configuration but if it's a machine on the cloud provider it should at least be reachable. | ||
try: | ||
gcp_get_res = requests.get( | ||
"http://metadata.google.internal/computeMetadata/v1", | ||
headers={"Metadata-Flavor": "Google"}, | ||
timeout=1, | ||
) | ||
if gcp_get_res.status_code != 404: | ||
result.cloud_provider_alt = "gcp" | ||
except requests.exceptions.ConnectionError: | ||
pass | ||
|
||
try: | ||
aws_get_res = requests.get( | ||
"http://169.254.169.254/latest/meta-data/", timeout=1 | ||
) | ||
if aws_get_res.status_code != 404: | ||
result.cloud_provider_alt = "aws" | ||
except requests.exceptions.ConnectionError: | ||
pass | ||
|
||
try: | ||
azure_get_res = requests.get( | ||
"http://169.254.169.254/metadata/instance?api-version=2021-02-01", | ||
headers={"Metadata": "true"}, | ||
timeout=1, | ||
) | ||
if azure_get_res.status_code != 404: | ||
result.cloud_provider_alt = "azure" | ||
except requests.exceptions.ConnectionError: | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MortalHappiness could you review this part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjyao Do you mean I need to create a Kubernetes cluster on GCP and AWS and test this manually? By the way, I don't have access to Azure either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use if-else here. If http://metadata.google.internal/computeMetadata/v1
, then we don't need to make requests to the other 2 URLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or should we make requests in parallel to ensure the timeout is at most 1 second? In your current implementation, the worst-case timeout is 3 seconds. Not sure if timing is critical here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- updated to if elif
- Timing isn't critical afaik, it only runs once at the start of
UsageStatsHead
run. Open to making 3 async requests though
Signed-off-by: dayshah <[email protected]>
Signed-off-by: dayshah <[email protected]>
Signed-off-by: dayshah <[email protected]>
Signed-off-by: dayshah <[email protected]>
Why are these changes needed?
On GKE
On anyscale on eks (google metadata req results in ConnectionError)
Note: Untested on azure
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.