Skip to content

feat: Add deletion warning notifications for expiring files#59

Merged
ttak-apphelix merged 9 commits intomasterfrom
ttak-apphelix/BOMS-398
Apr 13, 2026
Merged

feat: Add deletion warning notifications for expiring files#59
ttak-apphelix merged 9 commits intomasterfrom
ttak-apphelix/BOMS-398

Conversation

@ttak-apphelix
Copy link
Copy Markdown
Member

@ttak-apphelix ttak-apphelix commented Mar 4, 2026

Overview

This PR adds a “deletion warning” notification path to the partner retirement reporting workflow, intended to alert external POCs when previously-uploaded GDPR report CSVs are approaching their retention-based deletion date.

Changes:

  • Adds deletion_warning_days to config scaffolding.
  • Extends retirement_partner_report.py to optionally scan Drive for near-expiry report files and post warning comments.
  • Adds a new Drive API helper list_comments_for_file to support idempotency checks before posting warnings.

JIRA ticket

Related PR:

Copilot AI review requested due to automatic review settings March 4, 2026 06:36
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a “deletion warning” notification path to the partner retirement reporting workflow, intended to alert external POCs when previously-uploaded GDPR report CSVs are approaching their retention-based deletion date.

Changes:

  • Adds deletion_warning_days to test config scaffolding.
  • Extends retirement_partner_report.py to optionally scan Drive for near-expiry report files and post warning comments.
  • Adds a new Drive API helper list_comments_for_file to support idempotency checks before posting warnings.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
tubular/tests/retirement_helpers.py Adds deletion_warning_days to the fake YAML config used by tests.
tubular/scripts/retirement_partner_report.py Introduces expiration scanning + warning comment creation and new CLI options to control/override retention and warning windows.
tubular/google_api.py Adds DriveApi.list_comments_for_file for retrieving comments on a Drive file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tubular/scripts/retirement_partner_report.py Outdated
Comment thread tubular/scripts/retirement_partner_report.py Outdated
Comment thread tubular/scripts/retirement_partner_report.py Outdated
Comment thread tubular/google_api.py
Comment thread tubular/google_api.py
Comment thread tubular/google_api.py Outdated
Comment thread tubular/scripts/retirement_partner_report.py Outdated
Comment thread tubular/scripts/retirement_partner_report.py
Copilot AI review requested due to automatic review settings March 4, 2026 07:08
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tubular/scripts/retirement_partner_report.py
Copilot AI review requested due to automatic review settings March 31, 2026 04:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tubular/scripts/retirement_partner_report.py Outdated
Comment thread tubular/scripts/retirement_partner_report.py Outdated
@ttak-apphelix ttak-apphelix changed the title [DO NOT MERGE] feat: Add deletion warning notifications for expiring files feat: Add deletion warning notifications for expiring files Mar 31, 2026
Copy link
Copy Markdown
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gotten through this yet, but is this ready for review, and has it been through internal review?

I also would need to think through things again, but when I originally thought about this I thought we would need 2 toggles, one for when we drop the special prefix. Is that something that will be coming later?

Comment thread tubular/scripts/retirement_partner_report.py Outdated
@ttak-apphelix
Copy link
Copy Markdown
Member Author

I haven't gotten through this yet, but is this ready for review, and has it been through internal review?

I also would need to think through things again, but when I originally thought about this I thought we would need 2 toggles, one for when we drop the special prefix. Is that something that will be coming later?

Internal review is in progress. For prefix related change, yes it is part of separate PR.

Copilot AI review requested due to automatic review settings April 2, 2026 12:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tubular/scripts/retirement_partner_report.py
@ktyagiapphelix2u
Copy link
Copy Markdown
Contributor

I haven't gotten through this yet, but is this ready for review, and has it been through internal review?
I also would need to think through things again, but when I originally thought about this I thought we would need 2 toggles, one for when we drop the special prefix. Is that something that will be coming later?

Internal review is in progress. For prefix related change, yes it is part of separate PR.

I reviewed the PR Its okay to go

Copy link
Copy Markdown
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ttak-apphelix: Thank you. Here are some high-level comments:

  1. I want you to get to the P2 work sooner rather than later, so please update whatever you wish from my comments. Mostly, I want something that isn't broken, and I didn't review tests or code carefully enough to ensure that. I'm hoping that was covered internally.
  2. I can get very picky over language, and I got caught up in that. Apologies. Again - pick what you want, and try to move quickly.
  3. If you need me to review certain logic more carefully, let me know. And again, I really want to get to the more challenging follow-up PR about the files we are trying to delete that don't match prefix.

Comment on lines 293 to +294
"""
Add comments to the uploaded csv files, triggering email notification.

Extract external email addresses from partner folder permissions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm trying to separate what it does (first line) from how it does it (second line)
  2. Is "external" address meaningful to you?
  • Are there "internal" email addresses that are different? Does this word help? Maybe external_emails below should just be partner_emails?
  • Also, since the files contain learner emails as PII, I think calling these "partner email addresses" in the docs adds some clarity.
Suggested change
"""
Add comments to the uploaded csv files, triggering email notification.
Extract external email addresses from partner folder permissions.
Returns partner email addresses for provided partner names.
Partner email addresses are extracted from the partner folder permissions.

Comment on lines +296 to +297
This shared helper ensures consistent handling of permissions and denied domains
across different notification paths (upload comments and deletion warnings).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
This shared helper ensures consistent handling of permissions and denied domains
across different notification paths (upload comments and deletion warnings).
This shared helper ensures consistent handling of permissions and denied domains
across different notification paths (e.g. upload comments, deletion warnings, etc.).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated



def _add_comments_to_files(config, file_ids):
def _get_external_emails_for_partners(drive, config, partners=None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment for more details. Does this simpler name make sense?

Suggested change
def _get_external_emails_for_partners(drive, config, partners=None):
def _get_partner_emails(drive, config, partners=None):

Add comments to the uploaded csv files, triggering email notification.

Args:
file_ids (dict): Mapping of partner names to Drive file IDs corresponding to the newly uploaded csv files.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In place of file_ids, how do you like:

  • partner_to_file_ids_map, or
  • partner_file_ids_dict, or
  • `partner_file_ids?

I don't like that the current name does not seem like it would be a dict, and gives no indication of what the keys would be.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to partner_file_ids_dict

Args:
config (dict): Configuration dictionary containing retention settings and Drive folder mappings.
"""
retention_days = config.get('age_in_days', 60)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved comment out of docstring:

Suggested change
retention_days = config.get('age_in_days', 60)
# CLI parameter "--age_in_days" represents the retention period.
retention_days = config.get('age_in_days', 60)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

if file_created <= warning_threshold:
file_id = file_info.get('id')

# Check for filename in existing comments (stable even if message wording changes)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely clear on this comment. The first-half seems reversed? The second-half I'm unsure about. Is this mostly what you are trying to say?

Suggested change
# Check for filename in existing comments (stable even if message wording changes)
# First check if the file already has the deletion warning to avoid
# duplicate comments on re-run.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Updated


# Check for filename in existing comments (stable even if message wording changes)
existing_comments = drive.list_comments_for_file(file_id, fields='content')
has_warning = any(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Remind me (and we should probably document here) the relationship between file comments and email notifications. Are these two different things, or one and the same?
  • How do you know that the file couldn't have comments that are not deletion warning comments? Can someone add a comment to a file manually and would this mess things up? I'm not clear, and we should document this once I am more clear.

I am not suggesting the following yet, but we could protect against wording changes, but also ensure we are looking at deletion warning comments only:

    # NOTE: The following could be added above DELETION_WARNING_MESSAGE_TEMPLATE

    # Used to verify deletion comments; must exist in DELETION_WARNING_MESSAGE_TEMPLATE
    DELETION_WARNING_PHRASE = "will be automatically deleted"

   # Note: Then here, you could...
  • Ensure the template still includes DELETION_WARNING_PHRASE. If not, maybe fail hard.
  • Check only for comments that include DELETION_WARNING_PHRASE.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. File comments and email notification are same thing. POC will get the email from comments-noreply@docs.google.com with same content what it got posted for File comments
  2. Idea was to keep some unique in the comments that was filename but yes, somebody can reply on the comments with filename and this would have skipped the notification. Updated with DELETION_WARNING_PHRASE logic.

if days_until_deletion > warning_days:
# File is older than warning_threshold but not yet in the warning window;
# this shouldn't normally happen but guard against clock skew or config changes.
LOG('File {} has {} days until deletion, outside warning window, skipping'.format(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we be ok with a log message for every file, or will we be flooding the log?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep this as this job run weekly but let me know if you think otherwise.

seconds_until_deletion = (deletion_datetime - now).total_seconds()
days_until_deletion = int(seconds_until_deletion / 86400)

if days_until_deletion <= 0:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes:

  1. My understanding is that you are implementing in parts to keep PR size under control. I think that is fine, as long as:
  • We test the combination in Stage once all parts are complete, and we don't roll out any individual parts to Prod until the whole things is complete.
  • We make sure that we are focused primarily on the minimum number of changes to resolve the P2, which is handling these files: not filename.startswith(file_prefix).
  1. We discussed having a deletion notification (as opposed to this deletion warning notification), but maybe we do not need that for this P2 work (and possibly ever)? It's something for us to consider, but especially to unblock the P2 work. Also, I remember there was some complication around timing of notifying on a file that is going to be deleted, and how to ensure that the notifications actually get sent before the deletion happens.
  2. I'd love for the next PR to be around sending deletion warning notifications for the files that we care about (i.e. not filename.startswith(file_prefix)). I think this will get tricky and needs some thought around how you'll handle the fact that they are already past due for deletion, and you want to give a 7 days warning, and you want to not delete for at least 7 days. Since most of this is a unique rollout situation, with temporary toggles, I'd like you to be able to get to this as soon as possible.

'--age_in_days',
type=int,
default=None,
help='Number of days before files are deleted (matches cleanup job AGE_IN_DAYS).'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of having to keep parameters in sync across jobs. What are the jobs, and do they still make sense to keep separate? Do we need linting somewhere where this is configured to ensure they stay in sync?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is retirement-partner-report-cleanup job where a CSV file gets deleted after AGE_IN_DAYS is finished. This has been updated in RetirementJobs.groovy in PR edx/jenkins-job-dsl#1830

@ttak-apphelix ttak-apphelix merged commit 6f40a4c into master Apr 13, 2026
3 checks passed
@ttak-apphelix ttak-apphelix deleted the ttak-apphelix/BOMS-398 branch April 13, 2026 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants