Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DCJ-503] Gracefully handle DRS resolution failure case for self-hosted snapshots #1806

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

okotsopoulos
Copy link
Contributor

@okotsopoulos okotsopoulos commented Sep 5, 2024

Jira ticket: https://broadworkbench.atlassian.net/browse/DCJ-503

Addresses

In production, we are seeing some DRS resolution attempts fail with an NPE:

java.lang.NullPointerException: Cannot invoke \"com.google.cloud.storage.Bucket.getLocation()\" because \"bucket\" is null"

Reading the corresponding stack trace shows that this is happening for self-hosted GCP snapshots (logs).

Summary of changes

  • Gracefully handle the case where TDR cannot obtain the GCS bucket for a file object within a self-hosted GCP snapshot.
  • Provide diagnostic information on the source snapshot, root dataset, and file object in the error details. (Note that if we've gotten this far in our attempted DRS resolution, we've already verified that the user is authorized to read the snapshot, so they're allowed to get this diagnostics information back.)

An example exception thrown with this change (obtained from unit tests, so may not be fully populated as we've seen in the wild):

DrsObjectNotFoundException
GCS bucket from gs://path/to/file.txt not found (diagnostics returned in error details)
[class SnapshotCacheResult {
  snapshot: {
    id: e1c97007-d80a-487d-90da-f019ba6a9ebd
    name: null
    isSelfHosted: true
    globalFileIds: false
    billingProfileId: bd9b1387-2fb3-4aad-959f-c6015cdaab8a
    cloudPlatform: gcp
    googleProjectId: snapshot-google-project
  }
  rootDataset: {
    id: null
    name: null
    datasetProjectId: dataset-google-project
  }
}, bio.terra.service.filedata.FSFile@3cb01485[datasetId=<null>,cloudPath=gs://path/to/file.txt,mimeType=<null>,bucketResourceId=1fc4ea5c-fdc7-4262-ba38-7a8dcfbeb89b,loadTag=<null>]]

Testing Strategy

Expanded unit tests.

…ed snapshots

In production, we are seeing some DRS resolution attempts fail with an NPE:

java.lang.NullPointerException: Cannot invoke \"com.google.cloud.storage.Bucket.getLocation()\" because \"bucket\" is null"

Reading the corresponding stack trace shows that this is happening for self-hosted GCP snapshots.

Gracefully handle the case where TDR cannot obtain the GCS bucket for a file object within a self-hosted GCP snapshot, and provide diagnostic information on the source snapshot, root dataset, and file object in the error details.
@okotsopoulos okotsopoulos marked this pull request as ready for review September 5, 2024 18:52
@okotsopoulos okotsopoulos requested review from a team as code owners September 5, 2024 18:52
@okotsopoulos okotsopoulos requested review from rushtong and fboulnois and removed request for a team September 5, 2024 18:52
Copy link

sonarcloud bot commented Sep 5, 2024

Copy link
Contributor

@rushtong rushtong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 👍🏽

Copy link
Member

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK, but using a record may be a better way to ensure that all fields in a class are logged.

Comment on lines +1078 to +1098
@Override
public String toString() {
StringBuilder sb = new StringBuilder();
sb.append("class SnapshotCacheResult {\n");
sb.append(" snapshot: {\n");
sb.append(" id: ").append(this.id).append("\n");
sb.append(" name: ").append(this.name).append("\n");
sb.append(" isSelfHosted: ").append(this.isSelfHosted).append("\n");
sb.append(" globalFileIds: ").append(this.globalFileIds).append("\n");
sb.append(" billingProfileId: ").append(this.snapshotBillingProfileId).append("\n");
sb.append(" cloudPlatform: ").append(this.cloudPlatform).append("\n");
sb.append(" googleProjectId: ").append(this.googleProjectId).append("\n");
sb.append(" }\n");
sb.append(" rootDataset: {\n");
sb.append(" id: ").append(this.datasetId).append("\n");
sb.append(" name: ").append(this.datasetName).append("\n");
sb.append(" datasetProjectId: ").append(this.datasetProjectId).append("\n");
sb.append(" }\n");
sb.append("}");
return sb.toString();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's longer but I think a text block makes it clearer what the output will look like:

    @Override
    public String toString() {
      return """
          class SnapshotCacheResult {
            snapshot: {
              id: %s
              name: %s
              isSelfHosted: %s
              globalFileIds: %s
              billingProfileId: %s
              cloudPlatform: %s
              googleProjectId: %s
            }
            rootDataset: {
              id: %s
              name: %s
              datasetProjectId: %s
            }
          }
          """
          .formatted(
              id,
              name,
              isSelfHosted,
              globalFileIds,
              snapshotBillingProfileId,
              cloudPlatform,
              googleProjectId,
              datasetId,
              datasetName,
              datasetProjectId);
    }

// ...but bill to the snapshot's project
BucketGetOption.userProject(cachedSnapshot.googleProjectId));
// ...but bill to the snapshot's project.
BucketGetOption userProject = BucketGetOption.userProject(cachedSnapshot.googleProjectId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the constructor for SnapshotCacheResult, it looks possible to have a null googleProjectId even if isSelfHosted is true. If that happens, I think BucketGetOption.userProject() will throw an NPE.

If the code is correct -- when isSelfHosted is true, googleProjectId is not null -- you can make it explicit in the code by using requireNonNull()

Suggested change
BucketGetOption userProject = BucketGetOption.userProject(cachedSnapshot.googleProjectId);
BucketGetOption userProject = BucketGetOption.userProject(Objects.requireNonNull(cachedSnapshot.googleProjectId));

throw new DrsObjectNotFoundException(
"GCS bucket from %s not found (diagnostics returned in error details)"
.formatted(cloudPath),
List.of(cachedSnapshot.toString(), fsFile.toString()));
Copy link
Member

@pshapiro4broad pshapiro4broad Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are some data provided in the initial error message and some are appended at the end? Could all the data be provided in the error message? My understanding was causes is used in the case where we want to collect the chained exception causes and roll them up to the top level message.

@@ -1069,5 +1074,27 @@ public UUID getSnapshotBillingProfileId() {
public CloudPlatform getCloudPlatform() {
return this.cloudPlatform;
}

@Override
public String toString() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a custom toString(), how about converting this class to a record? It looks like all the fields are read-only already.

  record SnapshotCacheResult(
      UUID id,
      String name,
      boolean isSelfHosted,
      boolean globalFileIds,
      BillingProfileModel datasetBillingProfileModel,
      UUID snapshotBillingProfileId,
      CloudPlatform cloudPlatform,
      String googleProjectId,
      String datasetProjectId,
      UUID datasetId,
      String datasetName) {

    public SnapshotCacheResult(Snapshot snapshot) {
      this(
          snapshot.getId(),
          snapshot.getName(),
          snapshot.isSelfHosted(),
          snapshot.hasGlobalFileIds(),
          snapshot.getSourceDataset().getDatasetSummary().getDefaultBillingProfile(),
          snapshot.getProfileId(),
          snapshot.getCloudPlatform(),
          Optional.ofNullable(snapshot.getProjectResource())
              .map(GoogleProjectResource::getGoogleProjectId)
              .orElse(null),
          Optional.ofNullable(snapshot.getSourceDataset().getProjectResource())
              .map(GoogleProjectResource::getGoogleProjectId)
              .orElse(null),
          snapshot.getSourceDataset().getId(),
          snapshot.getSourceDataset().getName());
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! Working on that now.

Copy link
Contributor

@fboulnois fboulnois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Phil's comment on string formatting. Otherwise looks good 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants