Skip to content

feature/handle-similar-messages-as-scam #1292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

Alathreon
Copy link
Contributor

@Alathreon Alathreon commented Jul 31, 2025

From in #1283 with those differences:

  • Whole feature uses its own class
  • I used a local cache
  • I didn't use caffeeine
  • 4 new config variables
  • A few fail safes
  • Hash is based on SHA

@Alathreon Alathreon requested a review from a team as a code owner July 31, 2025 21:32
@Alathreon Alathreon linked an issue Jul 31, 2025 that may be closed by this pull request
@Alathreon Alathreon force-pushed the feature/handle-similar-messages-as-scam branch from 2842442 to c160a4d Compare July 31, 2025 21:33
@SquidXTV SquidXTV added enhancement New feature or request config-changes if your PR contains any changes related to config file labels Aug 1, 2025
Copy link
Member

@Zabuzard Zabuzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, thanks! I got some remarks regarding readability mostly 👍

@Alathreon Alathreon force-pushed the feature/handle-similar-messages-as-scam branch from c160a4d to 70375f7 Compare August 3, 2025 21:52
Copy link

sonarqubecloud bot commented Aug 3, 2025

Comment on lines +88 to +99
if (alreadyFlaggedUsers.contains(userId)) {
return true;
}
if (shouldIgnore(event.getMessage())) {
return false;
}
String hash = addToMessageCache(event).messageHash();
if (hasPostedTooManySimilarMessages(userId, hash)) {
alreadyFlaggedUsers.add(userId);
return true;
} else {
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure if that alreadyFlaggedUsers user thing really works.. the bot has a really long uptime usually, sometimes many months.
the way i read the code is that if a user got flagged once, they will then be immune to this check until the bot is restarted again. u could argue that its probably not an issue in practice but it smells a bit. you would need a time based cleanup for alreadyFlaggedUsers as well.

at which point u have like 30 lines of code dealing with what caffeeine gets done in one line, i guess.

private final Cache<String, Instant> userIdToAskedAtCache = Caffeine.newBuilder()
  .maximumSize(1_000)
  .expireAfterWrite(Duration.of(10, ChronoUnit.SECONDS))
  .build();

the interface is essentially that of a Map, so you have various get and put variants. internally its a LRU map, so it kicks out the entries that werent touched the longest when it reaches the max limit.

Copy link
Contributor Author

@Alathreon Alathreon Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do i make it a set ? if the cache already handles he cleanup, then storing the instant is pointless.
edit: i saw your other comment, i'll see that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

caffeine only has a Map. sometimes in the past we also only needed a Set and then mapped it to something. for example to itself, i.e. key == value.

* @param text the UTF 8 text to hash
* @return the computed hash
*/
public static byte[] hashUTF8(String method, String text) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: should probably be renamed into hashUtf8 for better camelCase readability

Comment on lines +47 to +54
private MessageInfo addToMessageCache(MessageReceivedEvent event) {
long userId = event.getAuthor().getIdLong();
long channelId = event.getChannel().getIdLong();
String messageHash = getHash(event.getMessage());
Instant timestamp = event.getMessage().getTimeCreated().toInstant();
MessageInfo messageInfo = new MessageInfo(userId, channelId, messageHash, timestamp);
messageCache.add(messageInfo);
return messageInfo;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id still refactor this and move all the stuff dealing with MessageInfo into the record. so:

record MessageInfo(...) {
  static MessageInfo fromMessageEvent(MessageReceivedEvent event) {...}
}

then this addToMessageCache method becomes obsolete and can be replaced with just messageCache.add(MessageInfo.fromMessageEvent(event)); at the call-site

the hash creation would then also move into the records fromMessageEvent method (or as private static helper in the record)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite like adding a unnecessary dependencies to a record, and even less logic like how to create a hash or how to create it from an event imo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mh... but its a factory to create instances of this record. so imo this is exactly the place it should be. what do others here think?

* @param event the message event
* @return true if the user spammed the message in several channels, false otherwise
*/
public boolean doSimilarMessageCheck(MessageReceivedEvent event) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could u do some reordering so the flow reads top to bottom. this method would then be at the top, as entry-point. and the private methods then below it in the order they are used. the runRoutine can stay at the very bottom, i guess, since its a detached flow.

"suspiciousAttachmentNamePattern": "(image|\\d{1,2})\\.[^.]{0,5}"
"suspiciousAttachmentNamePattern": "(image|\\d{1,2})\\.[^.]{0,5}",
"maxAllowedSimilarMessages": 2,
"similarMessagesWindow": 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing time unit. please add it to the name.

"maxAllowedSimilarMessages": 2,
"similarMessagesWindow": 1,
"similarMessageLengthIgnore": 10,
"similarMessagesWhitelist": []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think we need that, please remove.

the UX for it for maintainers would be really weird, having to bloat the config with 100 character long messages and whatnot. and then it didnt work bc of a smiley or other stuff that ends up slightly different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 4? or just the whitelist ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, the preview didnt turn out the way i wanted. i was referring to the message whitelist.

@@ -141,6 +137,10 @@ public void onMessageReceived(MessageReceivedEvent event) {
isSafe = false;
}

if (isSafe && similarMessagesDetector.doSimilarMessageCheck(event)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doSimilarMessageCheck should be renamed. maybe hasPostedSimilarMessageTooOften(event)

if you dont want it to start with a question-word bc of the side effects, maybe handleHasPostedSimilarMessagesTooOften(event).

but the name should clarify what the boolean it returns means and how it fits into the logic of ScamBlocker

/**
* Has to be called often to clear the cache.
*/
public void runRoutine() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to clearCache or cleanupCache or something.

return similarMessageCount > scamBlockerConfig.getMaxAllowedSimilarMessages();
}

private boolean isObsolete(MessageInfo messageInfo) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to isOutdated or isExpired

return true;
}
return scamBlockerConfig.getSimilarMessagesWhitelist()
.contains(message.getContentRaw().toLowerCase());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.toLowerCase needs Locale.US

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
config-changes if your PR contains any changes related to config file enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Auto ban/quarantine based on similar messages
3 participants