[smtp_batch] Additional field for grouping #2586

Lukas-Heindl · 2025-03-20T17:20:08Z

Hi,

I have a usecase where I'd like to send batched E-Mails to someone with collected events for which I want to use the smtp_batch output bot. Then in this case it would be nice to break down the events for one contact further into multiple E-Mails. Would it be possible to add some option such that the group-by is not only done by source.abuse_contact but also by a field determined by the bot configuration?
Eventually it would also be nice to use this field also included in the group-by in e.g. the subject, but that's just for the future.

I'm willing to submit this as a PR myself, I just wanted to create a place for input and also most important to check beforehand whether this has any chance of being merged.

So far smtp_batch inserts each event as value into redis with the bot_id + source.abuse_contact as key:

intelmq/intelmq/bots/outputs/smtp_batch/output.py

Lines 77 to 78 in a5f2f9c

    
           for mail in (field if isinstance(field, list) else [field]): 
        
               self.cache.redis.rpush(f"{self.key}{mail}", message.to_json())

It then later uses redis.keys() to retrieve all keys with a certain prefix (bot_id):

intelmq/intelmq/bots/outputs/smtp_batch/output.py

Line 224 in a5f2f9c

    
           for mail_record in self.cache.redis.keys(f"{self.key}*")[slice(self.limit_results)]:

So the most straight forward way would be to simply extend the redis-key with the value of the field that should be included in the logical group-by. But we'd need to be careful, the retrieved redis-key (here mail_record) is still used to obtain the destination E-Mail address:

intelmq/intelmq/bots/outputs/smtp_batch/output.py

Line 282 in a5f2f9c

email_to = str(mail_record[len(self.key):], encoding="utf-8")

In order to solve this I see two options

separate the values used to derive the redis-key with a special character like | -> in this case we'd need to make sure this character doesn't occur in any of the values -> replacing (but probably modifying the values is a bad idea) or escaping (a bit more complex) needed
we could switch to not using the retrieved redis-key anymore. Instead we know all retrieved events must have the same source.abuse_contact so we could simply get the value from an arbitrary event we retrieved from redis (same thing if we later want to gain access to the other field we included in the group-by). Decoupling the redis-key from the rest of the logic also should make it possible to simply hash the redis-key which avoids all trouble with the redis-key getting much longer with more keys involved in the group-by logic. So in this case the redis-key would look like this f"{self.key}{hash(source.abuse_contact, other_field1, other_field2, ...)}"

As I mentioned above I've got two questions

are you open generally for such additions to the smtp_batch bot?
if so, I'm happy about any input you have regarding my sketch/draft of an implementation (I think, I'd probably go for the 2nd approach described above)

The text was updated successfully, but these errors were encountered:

sebix · 2025-03-20T20:01:31Z

whether this has any chance of being merged.

Sure!

@e3rd is the original author of the bot and likely a user of it as well so I assume he can give you some feedback on your questions.

e3rd · 2025-03-20T22:49:15Z

Thanks, I ll definitely take a look at it, thanks!

e3rd · 2025-04-07T16:45:10Z

The 2nd approach really seems great! You make the group-by field defaulted to source.abuse_contact to keep it backwards compatible, right?

I am really ashamed I slowed down your PR by three weeks! Looking forward to the PR, please tag me there so that I can review.

Lukas-Heindl · 2025-04-09T14:19:13Z

All good.

Yes, the solution should be backwards compatible (also we need to force source.abuse_contact always being in the list of fields for which the events are grouped-by, since this field is always needed to send the mail so it has to be the same for all events).
Then I'll go and try approach 2.

Probably this will be a while because of time constraints and I need to setup my development environment first (so I can test all this).

e3rd · 2025-04-09T14:20:36Z

Fingers crossed!

e3rd · 2025-04-09T16:38:10Z

If you don't mind, I'd suggest to add a configuration option too. What do you think?

include_into_body: Optional[int]
""" Insert the first N characters of the attachment to the body. """

If bigger than 0, the part of the attachment contents would be put into the body apart from being attached.

Lukas-Heindl · 2025-04-09T16:48:20Z

Sounds interesting. Especially with the idea of introducing jinja2 to this bot (I know there already is the templated mail bot, but it does not allow for batching the events).
As I think about your suggestion the open issue would be how to display the start for the attachment into the body. But this might also be solved with jinja.

Also I'm not decided on whether it should be something like, only include the start of the attachment if it is smaller than N (so it fully fits into the body) or unconditionally the first N chars of the attachment (acting as some sort of preview).

Lukas-Heindl · 2025-04-09T16:50:32Z

Talking more generally about this bot, I am curious. Do you know why the attachment always is zipped? Is is common the attachment is that large and compression really brings a benefit (or is necessary due to the size and limitations in E-Mail servers rejecting large attachments)?

e3rd · 2025-04-09T16:58:54Z

how to display the start

just the plain text

full if small VS preview

I have no opinion, both fit :) The preview seems slightly better as it is consistent, but I don't know.
Also, it might be better to be the number of events, not number of chars.

Do you know why the attachment always is zipped?

Unfortunately, according to our experiences, mail servers are mad. Sometimes, 10 MB caused the e-mail to vanish if I remember well. To be consistent (so that the partners could automate), we've enforced the zip for all.

But if you don't like it, just add a flag for that.

Lukas-Heindl · 2025-04-17T12:58:12Z

Adding onto the ideas regarding the smtp_batch bot:

According to my test intelmq.bots.outputs.smtp_batch.output --cli --send <bot-id> does not exit with exit-code != 0 if sending the E-Mail failed (e.g. wrong password) -> in my opinion this should be changed allowing the user to trigger an alert (to the server-admin) something in the operation of intelmq has gone wrong

e3rd · 2025-04-17T13:58:20Z

Great enhancement, of course!

Sending mails with the smtp_batch output might fail e.g. due to wrong settings regarding the SMTP-Server. So far this is not reflected in the exit code when running the bot via cli to trigger sending the mails. Fixing this (at least for the non-interactive call) enables users to integrate this into their monitoring properly.

Lukas-Heindl · 2025-04-24T16:35:12Z

I am a bit confused

intelmq/intelmq/bots/outputs/smtp_batch/output.py

Lines 283 to 286 in ec37d0c

    
           count = len(rows_output) 
        
           if not count: 
        
               path = None 
        
           else:

intelmq/intelmq/bots/outputs/smtp_batch/output.py

Lines 301 to 304 in ec37d0c

    
           mail = Mail(mail_record, email_to, path, count) 
        
           self.build_mail(mail, send=False) 
        
           if count: 
        
               yield mail

@e3rd do you still remember what the idea here was? Why building the mail if count is None, particularly as the mail is not yielded anyhow? (can it even be None? https://docs.python.org/3/library/functions.html#len is not too specific about it) So the control flow looks a bit odd to me.
Also do we really want to send mail in case there are no events captured?

(And if you already want to have a look, I'm drafting at https://github.com/Lukas-Heindl/intelmq/tree/smtpBatchGroup)

e3rd · 2025-04-24T17:27:35Z

Good point, a design flaw, get rid of it. The draft looks nice! :)

Sending mails with the smtp_batch output might fail e.g. due to wrong settings regarding the SMTP-Server. So far this is not reflected in the exit code when running the bot via cli to trigger sending the mails. Fixing this (at least for the non-interactive call) enables users to integrate this into their monitoring properly.

Lukas-Heindl · 2025-04-28T14:53:51Z

Ok this is getting more like a (minor) refactoring of the bot.

I noticed the parameters allowed_fieldnames and fieldnames_translation are not documented yet. Writing the documentation for these parameters i noticed I need to include something like if you are inconsistent here, the bot will crash

Reason:

intelmq/intelmq/bots/outputs/smtp_batch/output.py

Line 273 in 38b9304

    
           rows_output.append(OrderedDict({self.fieldnames_translation[k]: row[k] for k in ordered_keys}))

will crash in case the ordered_keys (intersection of allowed_fieldnames and the keys defined in the event) contains a key which is not defined in fieldnames_translation.

Personally I'd be okay with this (shifting the responsibility to the user/admin to do the configuration right). Still I wanted to raise this for discussion whether the bot should be more robust in this regard. We could use dict.get to set a default (maybe the name of the key in allowed_fieldnames) when doing the translation.

Or we could remove the allowed_fieldnames option and use fieldnames_translation.keys() as a replacement. Though, just modifying the allowed_fieldnames option to remove a column from the csv without needing to toucht the translation might be an ease of use we want to keep.

Lukas-Heindl · 2025-04-29T15:23:24Z

Except for the include_into_body / preview, I think everything discussed here is implemented in #2610

Lukas-Heindl · 2025-05-08T17:00:20Z

Addition to the collectio of ideas:

optionally use paging to retrieve the data from redis (I experienced some OOM with too less RAM+swap -- paging the data should mitigate a bit)
same reason like for 1.: Currently we first collect all data for all mails just for the nice display what mails are in the pipeline. Maybe some way of "streaming" the data from redis directly to mail (so obtain all the data for one mail -> send the mail and loop this)

sebix added feature Indicates new feature requests or new features help wanted Indicates that a maintainer wants help on an issue or pull request labels Mar 20, 2025

e3rd self-assigned this Mar 20, 2025

e3rd assigned Lukas-Heindl and unassigned e3rd Apr 7, 2025

Lukas-Heindl mentioned this issue Apr 24, 2025

smtp_batch: proper exit code when non-interactive #2608

Merged

Lukas-Heindl mentioned this issue Apr 29, 2025

smtp_batch: add feature for grouping and templating #2610

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[smtp_batch] Additional field for grouping #2586

[smtp_batch] Additional field for grouping #2586

Lukas-Heindl commented Mar 20, 2025

sebix commented Mar 20, 2025

Uh oh!

e3rd commented Mar 20, 2025

Uh oh!

e3rd commented Apr 7, 2025

Uh oh!

Lukas-Heindl commented Apr 9, 2025

Uh oh!

e3rd commented Apr 9, 2025

Uh oh!

e3rd commented Apr 9, 2025

Uh oh!

Lukas-Heindl commented Apr 9, 2025

Uh oh!

Lukas-Heindl commented Apr 9, 2025

Uh oh!

e3rd commented Apr 9, 2025

Uh oh!

Lukas-Heindl commented Apr 17, 2025

Uh oh!

e3rd commented Apr 17, 2025

Uh oh!

Lukas-Heindl commented Apr 24, 2025

Uh oh!

e3rd commented Apr 24, 2025

Uh oh!

Lukas-Heindl commented Apr 28, 2025

Uh oh!

Lukas-Heindl commented Apr 29, 2025

Uh oh!

Lukas-Heindl commented May 8, 2025

Uh oh!

[smtp_batch] Additional field for grouping #2586

[smtp_batch] Additional field for grouping #2586

Comments

Lukas-Heindl commented Mar 20, 2025

sebix commented Mar 20, 2025

Uh oh!

e3rd commented Mar 20, 2025

Uh oh!

e3rd commented Apr 7, 2025

Uh oh!

Lukas-Heindl commented Apr 9, 2025

Uh oh!

e3rd commented Apr 9, 2025

Uh oh!

e3rd commented Apr 9, 2025

Uh oh!

Lukas-Heindl commented Apr 9, 2025

Uh oh!

Lukas-Heindl commented Apr 9, 2025

Uh oh!

e3rd commented Apr 9, 2025

Uh oh!

Lukas-Heindl commented Apr 17, 2025

Uh oh!

e3rd commented Apr 17, 2025

Uh oh!

Lukas-Heindl commented Apr 24, 2025

Uh oh!

e3rd commented Apr 24, 2025

Uh oh!

Lukas-Heindl commented Apr 28, 2025

Uh oh!

Lukas-Heindl commented Apr 29, 2025

Uh oh!

Lukas-Heindl commented May 8, 2025

Uh oh!