Skip to content

Clean up old CI errors#420

Merged
bsclifton merged 23 commits intomainfrom
kdh/fix-ci-errors
Mar 2, 2026
Merged

Clean up old CI errors#420
bsclifton merged 23 commits intomainfrom
kdh/fix-ci-errors

Conversation

@kdenhartog
Copy link
Copy Markdown
Member

No description provided.

This resets state for each tests so they aren't interacting with each other
additionally it extends the wait times to allow enough time for tests to complete.
this is useful so that we can run inside many versions of linux
with multiple different CPU architectures without having to
reimplement the logic here.
Root cause analysis revealed that after 3+ failed fix attempts, the
fundamental approach was flawed. The tests were fighting against the
normal URL lifecycle instead of verifying it works correctly.

Changes:
1. Removed testMode bypass in allowDoublefetch() that was preventing
   normal URL filtering from occurring
2. Removed testMode bypass in isAlreadyMarkedPrivate() that was forcing
   all URLs to appear public regardless of bloom filter state
3. Removed testMode conditionals around setAsPrivate() calls that were
   preventing URLs from being properly marked and removed

4. Restored original test assertions:
   - URLs should be removed from database after doubleFetch (length == 0)
   - URLs should NOT be marked as private (private == 0)

The tests now verify actual production behavior: news URLs go through
the full double-fetch process and are correctly identified as public
(removed from database but NOT added to bloom filter).

Kept valid improvements from previous commits:
- Increased test timeouts (30s/60s)
- Bloom filter cleanup in beforeEach/afterEach

Systematic debugging showed this was an architectural problem, not a
test flakiness issue. The solution is to test the real behavior, not
create a special test-only execution path.
@kdenhartog kdenhartog marked this pull request as draft February 9, 2026 13:58
@yshym yshym force-pushed the kdh/fix-ci-errors branch 2 times, most recently from 5e5b5c2 to 875e242 Compare February 9, 2026 20:07
@kdenhartog kdenhartog force-pushed the kdh/fix-ci-errors branch 3 times, most recently from 56bfe7a to ef1a22f Compare February 10, 2026 10:03
@kdenhartog
Copy link
Copy Markdown
Member Author

The only way we're going to be able to get this running is if we perform npm audit fixes. It looks like I've got most of those issues taken care of, but that's going to turn this fix into a larger amount of work.

This one likely will need to be split into multiple issues. From what I can tell we still need to address:

  1. Build issues related to webpack change + the recent changes to the webext bump which is causing the npm audit issues
  2. Regression test workflow still needs to get the keyring fixed it seems
  3. A regression test is still failing when I run this locally

@yshym yshym requested a review from remusao February 12, 2026 13:18
@remusao remusao marked this pull request as ready for review February 20, 2026 13:52
@remusao remusao requested a review from mihaiplesa as a code owner February 20, 2026 13:52
@yshym yshym force-pushed the kdh/fix-ci-errors branch from 72459fb to ab9a35c Compare February 23, 2026 12:21
@yshym yshym force-pushed the kdh/fix-ci-errors branch from ab9a35c to 122007d Compare February 23, 2026 12:26
remusao
remusao previously approved these changes Feb 25, 2026
@remusao remusao enabled auto-merge (squash) March 2, 2026 23:30
@bsclifton bsclifton disabled auto-merge March 2, 2026 23:34
@bsclifton
Copy link
Copy Markdown
Member

Admin merging - we may have unintentionally made things too strict with #439

@bsclifton bsclifton force-pushed the kdh/fix-ci-errors branch from 311cde9 to 44ddc2a Compare March 2, 2026 23:44
kdenhartog and others added 17 commits March 2, 2026 16:46
The hash detection threshold of 0.015 was too strict, causing legitimate
hash strings to be missed. This was causing 3 integration tests to fail:
- isHash('04C2EAD03B') returning false instead of true
- isHash('54f5095c96e') returning false instead of true
- isSuspiciousTitle with hash 4f0849709c511232fe72059d5a1d3344a668035a

Root cause analysis:
- These hashes have probability values just above 0.015 (0.01519, 0.01811, 0.02618)
- The threshold was too strict to catch these edge cases
- These tests have been failing since PR #443 (parser separation in Dec 2025)

Fix:
- Increased default threshold from 0.015 to 0.027
- Updated explicit thresholds in sanitizer.js for consistency
- This allows all legitimate hash test cases to pass while maintaining
  sufficient discrimination against non-hash strings

Impact:
- Integration tests now pass
- Better privacy protection as more hash-like identifiers in URLs will be detected
- Update `isHash` implementation
- Tweak `isHash` thresholds to prevent false-negatives
- Add more tests for `isHash` and `dropLongURL`
- Fix some of the "is allowed" and "is private" test cases
Resolves npm audit vulnerabilities:
- GHSA-p8p7-x288-28g6 (SSRF in request package)
- Eliminates deprecated request dependency chain

Breaking changes review:
- Config file format change does not affect us (no config files)
- Node.js 18+ requirement satisfied (using v24)
- Programmatic API usage unchanged
…ling

Dockerfile.ci:
- Revert to manual deb download approach for Brave installation
- Download brave-keyring from S3 (v1.13-1)
- Download brave-browser from GitHub releases (v1.86.148)
- Install with curl and dpkg instead of install script

web-discovery-project.es:
- Add telegraph.co.uk/news to allowlist for long URLs
- Protect allowlisted URLs from dropLongURL rejection when no canonical URL exists

The install script approach failed because brave-keyring dependency was not
properly resolved in the Docker base image. Manual deb installation worked
previously and is pinned to specific versions for reproducible CI builds.

Verified locally with act: integration tests pass, regression test partially
fixed (telegraph URL now passes initial checks but may need additional work
on the double-fetch allowlist protection).
Will be implemented separately
@bsclifton bsclifton force-pushed the kdh/fix-ci-errors branch from 44ddc2a to 9188cc9 Compare March 2, 2026 23:46
@bsclifton bsclifton merged commit ddb1ca0 into main Mar 2, 2026
8 of 10 checks passed
@bsclifton bsclifton deleted the kdh/fix-ci-errors branch March 2, 2026 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants