Skip to content

Conversation

antkryt
Copy link
Contributor

@antkryt antkryt commented Jun 5, 2025

Purpose

verify if spammy domains are detected

Changes

  • merge check_resource_for_domains_postcommit and check_resource_with_spam_services tasks to avoid race condition
  • compare note to value not enum
  • log detected domains to sentry

QA Notes

You can test it with domain xakw1.com on staging3. Currently project won't be banned with this domain, regardless of whether it's public or not.

Documentation

Side Effects

Ticket

https://openscience.atlassian.net/browse/ENG-5862

@antkryt
Copy link
Contributor Author

antkryt commented Jun 5, 2025

The issue described in the ticket is not entirely accurate. I found the domain "xawk1.com" on staging, and content containing this domain will not be banned even if the project is public (here is an example). And on the other side, if you repeat the steps from the ticket using some other domains, everything works as expected.

However, the spam check will always be triggered because the DomainReference is always created and you can verify it in django admin (the only place it’s created is in the function _check_resource_for_domains). Therefore, only two possibilities remain:

if notable_domain.note == NotableDomain.Note.EXCLUDE_FROM_ACCOUNT_CREATION_AND_CONTENT is not being triggered
or
resource.confirm_spam(save=True, domains=list(spammy_domains)) silently fails (changes are not saved in the database or racing condition and changes are overwritten with some other process).

Probably changes made by check_resource_for_domains_postcommit are overwritten by check_resource_with_spam_services. Both of them start almost at the same time and load same (non-spam) version of the resource at the beginning. Which task is completed last, such changes will be saved in the database (typically check_resource_with_spam_services ends last because we make request to external service)

Copy link
Contributor

@Johnetordoff Johnetordoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antkryt I'm a little confused exactly what this accomplishes, could you write some test cases to illustrate how you are changing the current behavior. I see that the tests are changed, but I don't see new test cases. A good test case here should show the spam content or spam domain of a wiki is checked when a project is made public.

@antkryt
Copy link
Contributor Author

antkryt commented Jun 10, 2025

@Johnetordoff I'm not sure that it's possible to write a test case to illustrate what I'm changing here, so I'll try to explain better.

Both check_resource_for_domains_postcommit and check_resource_with_spam_services are doing the same thing: start after response is sent to client (run_postcommit decorator) -> load node with spam_status=UNKNOWN -> process -> save changes to db.

As you can see, tasks don't know anything about each other and changes that are made (it's two different parallel processes). So, if check_resource_with_spam_services finishes after check_resource_for_domains_postcommit, then changes made by check_resource_for_domains_postcommit will be overwritten (and vice versa). Locally everything works because SPAM_SERVICES_ENABLED is false and only check_resource_for_domains_postcommit is running.

Also, I've changed signature of the def _check_resource_for_domains(guid, content) to def _check_resource_for_domains(resource, content) and removed confirm_spam() call from it, that's why I've updated some tests to make them work as before

As for checking spam during privacy change tests, we have bunch of those (see osf_tests/test_node.py::TestNodeSpam). I'll add a few tests to check if new check_resource_for_spam_postcommit function works properly

Copy link
Contributor

@Johnetordoff Johnetordoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not quite the behavior we are looking for, but correct techniques.


@mock.patch.object(settings, 'SPAM_SERVICES_ENABLED', True)
@mock.patch('osf.external.spam.tasks._check_resource_for_domains')
def test_check_resource_for_spam_postcommit_with_spammy_domains(self, mock_check_domains, project, user):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THe ticket instructions to reproduce the bug are:

  1. Create project
  2. Update wiki with spammy domain
  3. Make project public

These are the steps the tests should follow, you are testing behavior that is too specific we know checking for spam domans works, when a project is saved when public, but we need to check a project's wiki content too, not just it's desciption etc.

Copy link
Contributor

@Johnetordoff Johnetordoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good new tests, thanks!

@brianjgeiger brianjgeiger changed the base branch from feature/pbs-25-10 to feature/pbs-25-13 June 30, 2025 17:22
@brianjgeiger brianjgeiger merged commit 5af4c80 into CenterForOpenScience:feature/pbs-25-13 Jun 30, 2025
6 checks passed
Johnetordoff added a commit to Johnetordoff/osf.io that referenced this pull request Jul 16, 2025
…cience/osf.io into add-brand-to-collection-provider

* 'feature/pbs-25-13' of https://github.com/CenterForOpenScience/osf.io: (25 commits)
  [ENG-8224] Fixed force archive template with registration addons (CenterForOpenScience#11210)
  API: Allow /v2/users/me/preprints list view to filter by title
  [ENG-8325] Public column does not display the visibility status of child nodes on the Nodes page in the Admin App
  [ENG-8246] Fixed deletion of maintenance alerts in admin (CenterForOpenScience#11226)
  add exception handling to /review_actions/ endpoint
  added a route to download node metadata (CenterForOpenScience#11215)
  [ENG-7929] Ability to move registrations to draft state (CenterForOpenScience#11153)
  switch to new UI when user views draft registration file (CenterForOpenScience#11144)
  [ENG-5862] SPAM - Fix Wiki Spamming (CenterForOpenScience#11171)
  improved displaying of stashed urls and approval state in admin (CenterForOpenScience#11193)
  [ENG-7962] Fix User Setting Response Payload async mailchimp perference change issues (CenterForOpenScience#11136)
  fixed children/parent fields in admin templates (CenterForOpenScience#11156)
  don't add multiple group perms for preprint provider (CenterForOpenScience#11159)
  fix content overflow for node page (CenterForOpenScience#11182)
  [ENG-8096] Admins on projects are unable to reject user access requests (CenterForOpenScience#11163)
  added retry to avoid race condition (CenterForOpenScience#11179)
  upgrade django to 4.2.17 (CenterForOpenScience#11173)
  add additional information to user admin (CenterForOpenScience#11184)
  [ENG-8192] Ability to force archive registrations when OSFS Folders have changed (CenterForOpenScience#11194)
  [ENG-8193] Fix issues with Preprint submission via API (CenterForOpenScience#11185)
  ...
Johnetordoff added a commit to Johnetordoff/osf.io that referenced this pull request Jul 29, 2025
…cience/osf.io into refactor-notifications

* 'feature/pbs-25-13' of https://github.com/CenterForOpenScience/osf.io: (33 commits)
  added academiaInstitution in social-schema, fixed True value of 'ongoing', fixed/added tests (CenterForOpenScience#11239)
  [ENG-7979] Registrations pending moderation that have components also pending moderation do not display the children (CenterForOpenScience#11222)
  [ENG-8216] Fixed children deletion on a node page in admin (CenterForOpenScience#11237)
  [ENG-8401] Fixed preprint downloading (CenterForOpenScience#11238)
  fix categories for sendgrid emails (CenterForOpenScience#11236)
  [ENG-8936] API: Allow /v2/users/me/preprints list view to filter by tags (CenterForOpenScience#11232)
  Add collections scopes to FULL_READ and FULL_WRITE
  add brand relationship to collectionprovider
  [ENG-8224] Fixed force archive template with registration addons (CenterForOpenScience#11210)
  API: Allow /v2/users/me/preprints list view to filter by title
  [ENG-8325] Public column does not display the visibility status of child nodes on the Nodes page in the Admin App
  [ENG-8246] Fixed deletion of maintenance alerts in admin (CenterForOpenScience#11226)
  add exception handling to /review_actions/ endpoint
  added a route to download node metadata (CenterForOpenScience#11215)
  [ENG-7929] Ability to move registrations to draft state (CenterForOpenScience#11153)
  switch to new UI when user views draft registration file (CenterForOpenScience#11144)
  [ENG-5862] SPAM - Fix Wiki Spamming (CenterForOpenScience#11171)
  improved displaying of stashed urls and approval state in admin (CenterForOpenScience#11193)
  [ENG-7962] Fix User Setting Response Payload async mailchimp perference change issues (CenterForOpenScience#11136)
  fixed children/parent fields in admin templates (CenterForOpenScience#11156)
  ...

# Conflicts:
#	tests/test_user_profile_view.py
Johnetordoff added a commit to Johnetordoff/osf.io that referenced this pull request Aug 4, 2025
…cience/osf.io into add-new-notifications-data-model

* 'feature/pbs-25-13' of https://github.com/CenterForOpenScience/osf.io: (35 commits)
  [ENG-8462]  Institution setup fixes (CenterForOpenScience#11241)
  [ENG-8401] Earlier preprint versions download the current file (CenterForOpenScience#11245)
  added academiaInstitution in social-schema, fixed True value of 'ongoing', fixed/added tests (CenterForOpenScience#11239)
  [ENG-7979] Registrations pending moderation that have components also pending moderation do not display the children (CenterForOpenScience#11222)
  [ENG-8216] Fixed children deletion on a node page in admin (CenterForOpenScience#11237)
  [ENG-8401] Fixed preprint downloading (CenterForOpenScience#11238)
  fix categories for sendgrid emails (CenterForOpenScience#11236)
  [ENG-8936] API: Allow /v2/users/me/preprints list view to filter by tags (CenterForOpenScience#11232)
  Add collections scopes to FULL_READ and FULL_WRITE
  add brand relationship to collectionprovider
  [ENG-8224] Fixed force archive template with registration addons (CenterForOpenScience#11210)
  API: Allow /v2/users/me/preprints list view to filter by title
  [ENG-8325] Public column does not display the visibility status of child nodes on the Nodes page in the Admin App
  [ENG-8246] Fixed deletion of maintenance alerts in admin (CenterForOpenScience#11226)
  add exception handling to /review_actions/ endpoint
  added a route to download node metadata (CenterForOpenScience#11215)
  [ENG-7929] Ability to move registrations to draft state (CenterForOpenScience#11153)
  switch to new UI when user views draft registration file (CenterForOpenScience#11144)
  [ENG-5862] SPAM - Fix Wiki Spamming (CenterForOpenScience#11171)
  improved displaying of stashed urls and approval state in admin (CenterForOpenScience#11193)
  ...

# Conflicts:
#	admin/nodes/views.py
#	api_tests/users/views/test_user_settings_detail.py
#	framework/email/tasks.py
#	tests/framework_tests/test_email.py
#	tests/test_user_profile_view.py
adlius added a commit that referenced this pull request Aug 13, 2025
* handle django core validation error in drf view

* add unit tests

* add registration_word for registration provider

* [ENG-8290] Allow collection search POST with token scope (#11201)

Purpose

Add scope for POST collection search

* [ENG-8247] Ability to delete draft preprints from database (#11191)

## Purpose
User should be able to delete draft preprints, so that we don't save them in database

## Changes
Added `delete` method.
User can delete a preprint only if it's in initial state, so it means this preprint is a draft

## Ticket
https://openscience.atlassian.net/browse/ENG-8247

* [ENG-8193] Fix issues with Preprint submission via API (#11185)

## Purpose
handle django core ValidationError in drf view

## Ticket
https://openscience.atlassian.net/browse/ENG-8193

* [ENG-8192] Ability to force archive registrations when OSFS Folders have changed (#11194)

## Purpose
fix force archive for some actions

## Changes
- divide revert_log_actions into separate functions
- include trashed children when build a file tree
- add additional retrieval logic for osf_storage_folder_created and osf_storage_file_removed
- add generic retrieval fallback

## Ticket
https://openscience.atlassian.net/browse/ENG-8192

* add additional information to user admin (#11184)

## Purpose
add more information to user admin

## Changes
Added fields:
- id
- Deactivation request
- Change password last attempt
- Change password invalid attempts
- Socials
- User is staff
- Groups

Added dates to:
- Disabled
- Registered
- Confirmed

## Ticket
https://openscience.atlassian.net/browse/ENG-8190

* upgrade django to 4.2.17 (#11173)

## Purpose
Because Django 4.2.15 version has a vulnerability, it was upgraded to 4.2.17

## Changes
Updated pyproject.toml and lock file

## Ticket
https://openscience.atlassian.net/browse/ENG-8176

* added retry to avoid race condition (#11179)

## Purpose

User received an email of archive failure regardless of it was successful. It may happen because `archive_success` is run asynchronously

https://github.com/CenterForOpenScience/osf.io/blob/2328dd60f55e9c1281dcb29dcd45a78a7fd2cc5f/website/archiver/listeners.py#L33-L49

and may be finished before the main thread finishes archiving process. So at first `archive_success` is processed and no files found, thus email is sent, then the main thread finishes files processing and this archiving is successful actually.
Also it's possible that the celery queue had too many tasks to process and when the main thread finishes archiving, user sees his registration and when celery processes `archive_success` tasks that fails, user receives this email.

## Changes
Added a one-time retry.

## Ticket
https://openscience.atlassian.net/browse/ENG-8175?atlOrigin=eyJpIjoiMjg4MWM1YWI1ZTE3NDMyZmEyODk2Y2QxZjlhNjFlOGQiLCJwIjoiaiJ9

* [ENG-8096] Admins on projects are unable to reject user access requests (#11163)

## Purpose
Add default value when reject access request

## Ticket
https://openscience.atlassian.net/browse/ENG-8096

* fix content overflow for node page (#11182)

## Purpose
fix content overflow on the node admin page

## Ticket
https://openscience.atlassian.net/browse/ENG-8063

* don't add multiple group perms for preprint provider (#11159)

## Purpose
don't add multiple group perms for preprint provider

## Changes
- check if user already belongs to some provider group
- change checkbox to radio select

## Ticket
https://openscience.atlassian.net/browse/ENG-8016

* fixed children/parent fields in admin templates (#11156)

## Purpose

Admin templates used nonexistent fields to display children and parent of a node (potentially new fields were added but the old fields weren't replaced by new ones). The templates used `parent` and `children` fields. However an endpoint that adds children and parent to a node uses fields `descendants` (or through `NodeRelation` using `get_nodes` method) and `parent_node` property through `_parents` field

## Changes

Use correct fields

## Notes

Deletion of children nodes that are displayed now is broken. Together with Mark decided to create a separate ticket for this issue

## Ticket

https://openscience.atlassian.net/browse/ENG-7969

* [ENG-7962] Fix User Setting Response Payload async mailchimp perference change issues (#11136)

## Purpose
Correct issues where PATCH responses wouldn't show preferences as updated due to async mailchimp task execution. Overrides instance model to return changed value, even though db value is unchanged until mailchimp confirms.

## Changes
- adds special case for returning desired value synchronously 
- removes `update_osf_help_mails_subscription` because it's not used, but tested.
- adds test cases

## Ticket
https://openscience.atlassian.net/browse/ENG-7962

* improved displaying of stashed urls and approval state in admin (#11193)

## Purpose
Admins feel discomfort while looking for tokens in stashed urls and approval state fields on a registration page in admin

## Changes
Format fields in a json-like way

## Ticket
https://openscience.atlassian.net/browse/ENG-6614?atlOrigin=eyJpIjoiMTdkMTM0YzBmNjljNDcxYTkxNjYzZDRjYTY3NjEyODMiLCJwIjoiaiJ9

* [ENG-5862] SPAM - Fix Wiki Spamming (#11171)

<!-- Before submit your Pull Request, make sure you picked
     the right branch:

     - For hotfixes, select "master" as the target branch
     - For new features, select "develop" as the target branch
     - For release feature fixes, select the relevant release branch (release/X.Y.Z) as the target branch -->

## Purpose
verify if spammy domains are detected

## Changes
- merge check_resource_for_domains_postcommit and check_resource_with_spam_services tasks to avoid race condition
- compare note to value not enum
- log detected domains to sentry

## QA Notes
You can test it with domain [xakw1.com](https://admin.staging3.osf.io/admin/osf/notabledomain/1180/change/?_changelist_filters=q%3Dxakw1.com) on staging3. Currently project won't be banned with this domain, regardless of whether it's public or not.

## Ticket
https://openscience.atlassian.net/browse/ENG-5862

* switch to new UI when user views draft registration file (#11144)

## Purpose
Throughout registration creation user can view attached files. However `resolve_guid` doesn't handle this case and renders the legacy UI that shows error because `get_rdf_type` raises `NotImplementedError`. So this case has no be handled by ember. However when we switch to the new UI, we get `draftnode is not a supported target type`. Also there will be some work for FE because for now ember relies on `node` relationship that draft node/registration don't have. Taking into account Futa's words, it's an unusual flow for ember

## Changes
Added redirect to ember, updated `view_map` that referenced to the non-existing view `draft_nodes:node-detail` to use `draft_nodes:draft-node-detail` view

## Notes
1. I'm not confident that `draft-node` key is still used in `view_map`. From the `resolve` method of `TargetField` I see that keys are either model name in lowercase or `referent._name` that I couldn't find how is created (maybe automatically). However if it existed, there would be some errors that this view does not exist. So I left both keys but with the correct view name. In case you know the answer, I can remove the original key and leave only the correct one

## Ticket
https://openscience.atlassian.net/jira/software/c/projects/ENG/boards/145?selectedIssue=ENG-5810

* [ENG-7929] Ability to move registrations to draft state (#11153)

## Purpose
Admins should be able to revert registration to draft state for the support purposes

## Changes
Added functionality to revert registration back to draft state
Added a button in admin to revert a registration
Added tests

## Notes:
Currently we are discussing with Mark what to do with a registration after admin reverts it, so this ticket cannot be merged. The issue is that when admin reverts registration, it's still displayed for admin, just with deletion date and when user re-registers a draft, the system creates a new registration and the draft is linked to this newest version and admin sees both registrations. So if admin tries to revert the previous version, he'll get an error because the version doesn't have the linked draft. What I've suggested:
1. Fully delete previous registered version so that you won't be able to see it in admin. In this way if the registered version had guid 123ab and user registered the restored draft again, the draft would have another registration with another guid, like 999ss . Basically for user it'll be the same registration, just with another url to access it
2. Add another (or the same) hint for Revert to Draft button that you can't revert this previous version (the easiest solution)
3. One more solution, that I think will be the most complicated between these three, is that we can try to save the guid of an original registered version and whenever admin reverts registration and user registers it again from the same draft, we assign the saved guid to a new registration

**DECISION**:
Option 2 is the most reliable. In case we want #1 or #3 behavior, a new ticket will be created

## Ticket
https://openscience.atlassian.net/browse/ENG-7929

* added a route to download node metadata (#11215)

## Purpose
So right now, our firewall rules aren't advanced enough to be able to let requests from `/<guid>/metadata` go through to osf on staging4. Instead, we have to start with a stable string, and do the dynamic part afterward, hence doing `/metadata/<guid>/`  which we can match with `/metadata/*` in the firewall.

## Changes
Added a corresponding route

## Ticket
https://openscience.atlassian.net/browse/ENG-8261

* add exception handling to /review_actions/ endpoint

* [ENG-8246] Fixed deletion of maintenance alerts in admin (#11226)

What

When you try to delete the active maintenance banner from the admin app, it 502s. The traceback says that there is no url to redirect to and to provide a success url. 

Acceptance Criteria

A user will be able to add a maintenance banner and then delete it. When the banner is added, it will show on the OSF dashboard page (among other places). When deleted, it will no longer show.

* [ENG-8325] Public column does not display the visibility status of child nodes on the Nodes page in the Admin App

Public column does not display the visibility status of child nodes on the Nodes page in the Admin App

* API: Allow /v2/users/me/preprints list view to filter by title

Allow the User Preprints endpoint (/v2/users/me/preprints) to be filterable by title (?filter[title]=example), similar to how the User Nodes endpoint (/v2/users/me/nodes) supports this filter capability

* [ENG-8224] Fixed force archive template with registration addons (#11210)

Enable Product Team to Force Archive Registrations in the Admin App

* add brand relationship to collectionprovider

* Add collections scopes to FULL_READ and FULL_WRITE

* [ENG-8936] API: Allow /v2/users/me/preprints list view to filter by tags (#11232)

* fix categories for sendgrid emails (#11236)

* [ENG-8401] Fixed preprint downloading (#11238)

* fixed preprint downloading

* fixed nonetype

* [ENG-8216] Fixed children deletion on a node page in admin (#11237)

* field children deletion in admin

* improved performance and removed unused attributes

* [ENG-7979] Registrations pending moderation that have components also pending moderation do not display the children (#11222)

* show only public nodes for non-authorized users in the node queryset

* add custom filters to can_view(); include pending nodes for moderators in get_node_count()

* add test

* added academiaInstitution in social-schema, fixed True value of 'ongoing', fixed/added tests (#11239)

## Purpose
V2 API doesn't allow setting `True` value for `ongoing` property in employment/education tabs and set `academiaInstitution` property in social tab

## Changes
Added `academiaInstitution` field in social-schema
Fixed ignored required properties for `ongoing` property in employment/education-schema files
Added new tests and fixed the old ones

## QA Notes
Can be tested only via API
Updates can be viewed in user settings

## Ticket
https://openscience.atlassian.net/browse/ENG-8455

* [ENG-8401] Earlier preprint versions download the current file (#11245)

* fixed earlier version download the newest version file

* fixed tests

* [ENG-8462]  Institution setup fixes (#11241)

* create monthly reports when institution is created

* better exception handling; async generate report

* fix test

* handle deactivated institutions; minor fixes

* [Reopen] [ENG-8192] Ability to force archive registrations when OSF Folders have changed (#11247)

* break force_archive into logical parts; support trashed children

* add unit tests; minor fixes

* fix name collisions issue and general fallback

* add test

* minor fixes

---------

Co-authored-by: Anton Krytskyi <[email protected]>
Co-authored-by: ihorsokhanexoft <[email protected]>
Co-authored-by: John Tordoff <[email protected]>
Co-authored-by: futa-ikeda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants