-
-
Notifications
You must be signed in to change notification settings - Fork 93
feature/handle-similar-messages-as-scam #1292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
2842442
to
c160a4d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, thanks! I got some remarks regarding readability mostly 👍
application/src/main/java/org/togetherjava/tjbot/config/MessageInfo.java
Outdated
Show resolved
Hide resolved
application/src/main/java/org/togetherjava/tjbot/features/moderation/scam/ScamBlocker.java
Outdated
Show resolved
Hide resolved
application/src/main/java/org/togetherjava/tjbot/features/moderation/scam/ScamBlocker.java
Outdated
Show resolved
Hide resolved
application/src/main/java/org/togetherjava/tjbot/features/moderation/scam/ScamBlocker.java
Outdated
Show resolved
Hide resolved
application/src/main/java/org/togetherjava/tjbot/features/moderation/scam/ScamBlocker.java
Outdated
Show resolved
Hide resolved
application/src/main/java/org/togetherjava/tjbot/config/MessageInfo.java
Outdated
Show resolved
Hide resolved
application/src/main/java/org/togetherjava/tjbot/features/moderation/scam/ScamBlocker.java
Outdated
Show resolved
Hide resolved
application/src/main/java/org/togetherjava/tjbot/config/ScamBlockerConfig.java
Outdated
Show resolved
Hide resolved
application/src/main/java/org/togetherjava/tjbot/features/moderation/scam/ScamBlocker.java
Outdated
Show resolved
Hide resolved
…ngs and refactored existing code
c160a4d
to
70375f7
Compare
|
if (alreadyFlaggedUsers.contains(userId)) { | ||
return true; | ||
} | ||
if (shouldIgnore(event.getMessage())) { | ||
return false; | ||
} | ||
String hash = addToMessageCache(event).messageHash(); | ||
if (hasPostedTooManySimilarMessages(userId, hash)) { | ||
alreadyFlaggedUsers.add(userId); | ||
return true; | ||
} else { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure if that alreadyFlaggedUsers
user thing really works.. the bot has a really long uptime usually, sometimes many months.
the way i read the code is that if a user got flagged once, they will then be immune to this check until the bot is restarted again. u could argue that its probably not an issue in practice but it smells a bit. you would need a time based cleanup for alreadyFlaggedUsers
as well.
at which point u have like 30 lines of code dealing with what caffeeine gets done in one line, i guess.
private final Cache<String, Instant> userIdToAskedAtCache = Caffeine.newBuilder()
.maximumSize(1_000)
.expireAfterWrite(Duration.of(10, ChronoUnit.SECONDS))
.build();
the interface is essentially that of a Map
, so you have various get
and put
variants. internally its a LRU map, so it kicks out the entries that werent touched the longest when it reaches the max limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do i make it a set ? if the cache already handles he cleanup, then storing the instant is pointless.
edit: i saw your other comment, i'll see that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caffeine only has a Map
. sometimes in the past we also only needed a Set
and then mapped it to something. for example to itself, i.e. key == value.
* @param text the UTF 8 text to hash | ||
* @return the computed hash | ||
*/ | ||
public static byte[] hashUTF8(String method, String text) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: should probably be renamed into hashUtf8
for better camelCase readability
private MessageInfo addToMessageCache(MessageReceivedEvent event) { | ||
long userId = event.getAuthor().getIdLong(); | ||
long channelId = event.getChannel().getIdLong(); | ||
String messageHash = getHash(event.getMessage()); | ||
Instant timestamp = event.getMessage().getTimeCreated().toInstant(); | ||
MessageInfo messageInfo = new MessageInfo(userId, channelId, messageHash, timestamp); | ||
messageCache.add(messageInfo); | ||
return messageInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id still refactor this and move all the stuff dealing with MessageInfo
into the record
. so:
record MessageInfo(...) {
static MessageInfo fromMessageEvent(MessageReceivedEvent event) {...}
}
then this addToMessageCache
method becomes obsolete and can be replaced with just messageCache.add(MessageInfo.fromMessageEvent(event));
at the call-site
the hash creation would then also move into the records fromMessageEvent
method (or as private static
helper in the record)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite like adding a unnecessary dependencies to a record, and even less logic like how to create a hash or how to create it from an event imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mh... but its a factory to create instances of this record. so imo this is exactly the place it should be. what do others here think?
* @param event the message event | ||
* @return true if the user spammed the message in several channels, false otherwise | ||
*/ | ||
public boolean doSimilarMessageCheck(MessageReceivedEvent event) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could u do some reordering so the flow reads top to bottom. this method would then be at the top, as entry-point. and the private methods then below it in the order they are used. the runRoutine
can stay at the very bottom, i guess, since its a detached flow.
"suspiciousAttachmentNamePattern": "(image|\\d{1,2})\\.[^.]{0,5}" | ||
"suspiciousAttachmentNamePattern": "(image|\\d{1,2})\\.[^.]{0,5}", | ||
"maxAllowedSimilarMessages": 2, | ||
"similarMessagesWindow": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing time unit. please add it to the name.
"maxAllowedSimilarMessages": 2, | ||
"similarMessagesWindow": 1, | ||
"similarMessageLengthIgnore": 10, | ||
"similarMessagesWhitelist": [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think we need that, please remove.
the UX for it for maintainers would be really weird, having to bloat the config with 100 character long messages and whatnot. and then it didnt work bc of a smiley or other stuff that ends up slightly different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All 4? or just the whitelist ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, the preview didnt turn out the way i wanted. i was referring to the message whitelist.
@@ -141,6 +137,10 @@ public void onMessageReceived(MessageReceivedEvent event) { | |||
isSafe = false; | |||
} | |||
|
|||
if (isSafe && similarMessagesDetector.doSimilarMessageCheck(event)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doSimilarMessageCheck
should be renamed. maybe hasPostedSimilarMessageTooOften(event)
if you dont want it to start with a question-word bc of the side effects, maybe handleHasPostedSimilarMessagesTooOften(event)
.
but the name should clarify what the boolean
it returns means and how it fits into the logic of ScamBlocker
/** | ||
* Has to be called often to clear the cache. | ||
*/ | ||
public void runRoutine() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to clearCache
or cleanupCache
or something.
return similarMessageCount > scamBlockerConfig.getMaxAllowedSimilarMessages(); | ||
} | ||
|
||
private boolean isObsolete(MessageInfo messageInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to isOutdated
or isExpired
return true; | ||
} | ||
return scamBlockerConfig.getSimilarMessagesWhitelist() | ||
.contains(message.getContentRaw().toLowerCase()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.toLowerCase
needs Locale.US
From in #1283 with those differences: