-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DCJ-503] Gracefully handle DRS resolution failure case for self-hosted snapshots #1806
base: develop
Are you sure you want to change the base?
Conversation
…ed snapshots In production, we are seeing some DRS resolution attempts fail with an NPE: java.lang.NullPointerException: Cannot invoke \"com.google.cloud.storage.Bucket.getLocation()\" because \"bucket\" is null" Reading the corresponding stack trace shows that this is happening for self-hosted GCP snapshots. Gracefully handle the case where TDR cannot obtain the GCS bucket for a file object within a self-hosted GCP snapshot, and provide diagnostic information on the source snapshot, root dataset, and file object in the error details.
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you 👍🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK, but using a record
may be a better way to ensure that all fields in a class are logged.
@Override | ||
public String toString() { | ||
StringBuilder sb = new StringBuilder(); | ||
sb.append("class SnapshotCacheResult {\n"); | ||
sb.append(" snapshot: {\n"); | ||
sb.append(" id: ").append(this.id).append("\n"); | ||
sb.append(" name: ").append(this.name).append("\n"); | ||
sb.append(" isSelfHosted: ").append(this.isSelfHosted).append("\n"); | ||
sb.append(" globalFileIds: ").append(this.globalFileIds).append("\n"); | ||
sb.append(" billingProfileId: ").append(this.snapshotBillingProfileId).append("\n"); | ||
sb.append(" cloudPlatform: ").append(this.cloudPlatform).append("\n"); | ||
sb.append(" googleProjectId: ").append(this.googleProjectId).append("\n"); | ||
sb.append(" }\n"); | ||
sb.append(" rootDataset: {\n"); | ||
sb.append(" id: ").append(this.datasetId).append("\n"); | ||
sb.append(" name: ").append(this.datasetName).append("\n"); | ||
sb.append(" datasetProjectId: ").append(this.datasetProjectId).append("\n"); | ||
sb.append(" }\n"); | ||
sb.append("}"); | ||
return sb.toString(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's longer but I think a text block makes it clearer what the output will look like:
@Override
public String toString() {
return """
class SnapshotCacheResult {
snapshot: {
id: %s
name: %s
isSelfHosted: %s
globalFileIds: %s
billingProfileId: %s
cloudPlatform: %s
googleProjectId: %s
}
rootDataset: {
id: %s
name: %s
datasetProjectId: %s
}
}
"""
.formatted(
id,
name,
isSelfHosted,
globalFileIds,
snapshotBillingProfileId,
cloudPlatform,
googleProjectId,
datasetId,
datasetName,
datasetProjectId);
}
// ...but bill to the snapshot's project | ||
BucketGetOption.userProject(cachedSnapshot.googleProjectId)); | ||
// ...but bill to the snapshot's project. | ||
BucketGetOption userProject = BucketGetOption.userProject(cachedSnapshot.googleProjectId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the constructor for SnapshotCacheResult
, it looks possible to have a null googleProjectId
even if isSelfHosted
is true. If that happens, I think BucketGetOption.userProject()
will throw an NPE.
If the code is correct -- when isSelfHosted
is true, googleProjectId
is not null -- you can make it explicit in the code by using requireNonNull()
BucketGetOption userProject = BucketGetOption.userProject(cachedSnapshot.googleProjectId); | |
BucketGetOption userProject = BucketGetOption.userProject(Objects.requireNonNull(cachedSnapshot.googleProjectId)); |
throw new DrsObjectNotFoundException( | ||
"GCS bucket from %s not found (diagnostics returned in error details)" | ||
.formatted(cloudPath), | ||
List.of(cachedSnapshot.toString(), fsFile.toString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are some data provided in the initial error message and some are appended at the end? Could all the data be provided in the error message? My understanding was causes
is used in the case where we want to collect the chained exception causes and roll them up to the top level message.
@@ -1069,5 +1074,27 @@ public UUID getSnapshotBillingProfileId() { | |||
public CloudPlatform getCloudPlatform() { | |||
return this.cloudPlatform; | |||
} | |||
|
|||
@Override | |||
public String toString() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a custom toString()
, how about converting this class to a record
? It looks like all the fields are read-only already.
record SnapshotCacheResult(
UUID id,
String name,
boolean isSelfHosted,
boolean globalFileIds,
BillingProfileModel datasetBillingProfileModel,
UUID snapshotBillingProfileId,
CloudPlatform cloudPlatform,
String googleProjectId,
String datasetProjectId,
UUID datasetId,
String datasetName) {
public SnapshotCacheResult(Snapshot snapshot) {
this(
snapshot.getId(),
snapshot.getName(),
snapshot.isSelfHosted(),
snapshot.hasGlobalFileIds(),
snapshot.getSourceDataset().getDatasetSummary().getDefaultBillingProfile(),
snapshot.getProfileId(),
snapshot.getCloudPlatform(),
Optional.ofNullable(snapshot.getProjectResource())
.map(GoogleProjectResource::getGoogleProjectId)
.orElse(null),
Optional.ofNullable(snapshot.getSourceDataset().getProjectResource())
.map(GoogleProjectResource::getGoogleProjectId)
.orElse(null),
snapshot.getSourceDataset().getId(),
snapshot.getSourceDataset().getName());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! Working on that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Phil's comment on string formatting. Otherwise looks good 👍
Jira ticket: https://broadworkbench.atlassian.net/browse/DCJ-503
Addresses
In production, we are seeing some DRS resolution attempts fail with an NPE:
Reading the corresponding stack trace shows that this is happening for self-hosted GCP snapshots (logs).
Summary of changes
An example exception thrown with this change (obtained from unit tests, so may not be fully populated as we've seen in the wild):
Testing Strategy
Expanded unit tests.