Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposed Change] Skip File Ref Validation Logic; Improve Logs #1310

Closed
wants to merge 5 commits into from

Conversation

snf2ye
Copy link
Contributor

@snf2ye snf2ye commented Jun 22, 2022

Note: There isn't yet a ticket for this work because I'm not sure if we want to make this change. If there is agreement that this change should be made, then can create a ticket!

Background

We were recently trying to figure out what went wrong with a user's ingest request. We encountered a batch of strange-looking logs:
image
While the logs look strange, they're actually okay. Each log indicates that we are performing a check against a column designated as a fileref. This particular ingest was only ingesting files into a single column. We were left with a wall of unhelpful logs.
Relevant code:
image

Issues

  1. Over-noisy and uninformative logs
  2. Revealed that we're running through a bunch of unnecessary validation code/steps for empty lists of refIds

Proposal

  1. Provide more meaningful logs by logging which fileref columns we're checking vs. skipping
  2. Add check so that we only proceed with validation steps if there are filerefs to check

@@ -41,8 +41,10 @@ public StepResult doStep(FlightContext context) throws InterruptedException {
for (Column column : table.getColumns()) {
if (column.isFileOrDirRef()) {
List<String> refIdArray = bigQueryDatasetPdao.getRefIds(dataset, stagingTableName, column);
List<String> badRefIds = fileDao.validateRefIds(dataset, refIdArray);
badRefIds.forEach(id -> invalidRefIds.add(new InvalidRefId(id, column.getName())));
if (!refIdArray.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check should be done inside FireStoreDao.validateRefIds() instead of in the caller, since it's an optimization that method can do regardless of who calls validateRefIds(). It could even be pushed one layer down into FireStoreDirectoryDao.validateRefIds().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I've moved it to the directory dao.

@snf2ye snf2ye changed the title skip validation if no ref ids in column [Proposed Change] Skip File Ref Validation Logic; Improve Logs Jun 30, 2022
Copy link
Member

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me. I wonder if some of our log.infos can become log.debugs as we have less need to see what the code is doing.

@snf2ye snf2ye closed this Oct 21, 2024
@snf2ye snf2ye deleted the sh-skip-validation branch October 21, 2024 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants