-
Notifications
You must be signed in to change notification settings - Fork 50
[AQUA] Track md logs for error logging #1219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -728,6 +729,44 @@ def update( | |||
|
|||
return self._update_from_oci_model(response) | |||
|
|||
def tail_logs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we use -
def logs(self, log_type: str = None) -> ConsolidatedLog:
"""Gets the access or predict logs.
Parameters
----------
log_type: (str, optional). Defaults to None.
The log type. Can be "access", "predict" or None.
Returns
-------
ConsolidatedLog
The ConsolidatedLog object containing the logs.
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I see that show_logs() within model_deployment.py also covers the logic described below.
@@ -1300,24 +1304,52 @@ def get_deployment_status( | |||
max_wait_time=DEFAULT_WAIT_TIME, | |||
poll_interval=DEFAULT_POLL_INTERVAL, | |||
) | |||
except Exception: | |||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add more test cases to cover this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggesting minor changes
@@ -728,6 +729,44 @@ def update( | |||
|
|||
return self._update_from_oci_model(response) | |||
|
|||
def tail_logs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I see that show_logs() within model_deployment.py also covers the logic described below.
|
||
if infrastructure.private_endpoint_id: | ||
if not hasattr( | ||
oci.data_science.models.InstanceConfiguration, "private_endpoint_id" | ||
): | ||
# TODO: add oci version with private endpoint support. | ||
raise EnvironmentError( | ||
raise OSError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, did ruff suggest this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
|
||
if predict_logs and len(predict_logs) > 0: | ||
status += predict_logs[0]["message"] | ||
status = re.sub(r"[^a-zA-Z0-9]", " ", status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: any reason why we're removing the non-alphanumeric characters here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without removing the non-alphanumeric characters, I was getting bad request when calling the head_object endpoint.
status = access_logs[0]["message"] | ||
|
||
if predict_logs and len(predict_logs) > 0: | ||
status += predict_logs[0]["message"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking whether we need to append both predict and access logs here. For aqua, I think UI passes the same ocids for predict and access logs, so we'll be duplicating the content. Instead, would it make sense to check for access logs first, and only look to add predict logs if these are empty? Better, if we know that both logs are the same, then we can just look at access.
telemetry_kwargs = { | ||
"ocid": ocid, | ||
"model_name": model_name, | ||
"status": error_str + " " + status, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's split this in two fields: "status" can be error_str, and then add "log_message" can be the status content from the logs. This way we can identify the error messages coming from work requests and also the deployment logs separately.
@@ -2369,6 +2369,7 @@ def test_validate_multimodel_deployment_feasibility_positive_single( | |||
) | |||
|
|||
def test_get_deployment_status(self): | |||
model_deployment = copy.deepcopy(TestDataset.model_deployment_object[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a few unit tests to cover the status coming from the logs?
Description
In this PR , we have added support to watch predict and access logs (if available) of Model Deployment and pass on error from the logs to telemetry.
Following changes were made as part of this requirement -
Jira
https://jira.oci.oraclecorp.com/browse/ODSC-73585
Sample error JSON from logs