Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduled queues are not being popped evenly in all scenarios #321

Open
lianatech-jutaky opened this issue Dec 5, 2024 · 2 comments
Open
Labels
bug Something isn't working

Comments

@lianatech-jutaky
Copy link

What Operating System are you seeing this problem on?

Docker with image ghcr.io/kumocorp/kumomta:2024.11.08-d383b033-amd64 on kernel 5.15.0-125-generic 135-Ubuntu

What Hardware is this system running?

Docker on an average x86_64 laptop

KumoMTA version

2024.11.08-d383b033

Did you try the latest release to see if the issue is better (or worse!) than your current version?

Yes, and I updated the version box above to show the version that I tried

Describe the bug

KumoMTA design philosophy does not support email priorities and uses scheduled queues instead.

https://kumomta.com/blog/setting-the-priority-flag

We had a scenario where we began using MXSuffix in production and the scheduled queue structure broke somehow. Small queues got buried into the system and mixed with bulk queues.

I started building a reproduction with the ready made Docker image and I believe I managed to reproduce the scenario.

However, I didn't end up using MXSuffix at all. So that feature is not to blame. Maybe the following docker setup helps to see what is going on here.

In my opinion this is a bug as there are two separate queues which should be popped evenly to achieve the wanted prioritization based on the scheduled queues.

This docker setup's idea is to send few thousand emails to bulk queue and couple high priority emails after that into a separate scheduled queue - as per the blog post's description to achieve priority.

I also had a separate container with Mailpit running it in just consuming the emails.

To Reproduce

Reproduction using the following Dockerfile. There is swaks-demo command within which I used to send the emails from within the container.

Configuration

The included settings are quite extreme, someone might say unrealistic, but they do demonstrate the bug and/or unwanted behavior quite effectively.

Dockerfile

Expected Behavior

The demo has been configured to send one email per second. I'd expect the hiprio queue emails to go out within few seconds as there are two queues and they should be fairly evenly popped for delivery.

In the demo the bulk queue clearly has precedence over the hiprio queue:

Queue summary right after injecting the hiprio emails.

root@fd397afb16ca:/# date && /opt/kumomta/sbin/kcli --endpoint http://127.0.0.1:8000 queue-summary
Thu Dec  5 09:58:10 UTC 2024
SITE                     SOURCE   PROTO         D T C Q
mailpit.127.0.0.1.nip.io dummysrc smtp_client 199 0 0 1

SCHEDULED QUEUE                  COUNT
hiprio:[email protected]      2
lowprio:[email protected] 2,799

Five minutes later the small queue has not been touched at all and this will go on like this for awhile.

root@fd397afb16ca:/# date && /opt/kumomta/sbin/kcli --endpoint http://127.0.0.1:8000 queue-summary
Thu Dec  5 10:03:27 UTC 2024
SITE                     SOURCE   PROTO         D T C Q
mailpit.127.0.0.1.nip.io dummysrc smtp_client 506 0 0 1

SCHEDULED QUEUE                  COUNT
hiprio:[email protected]      2
lowprio:[email protected] 2,493

Given large enough bulk queue, the hiprio emails will basically never go out. We experienced delays of up to few hours in production.

Anything else?

No response

@lianatech-jutaky lianatech-jutaky added the bug Something isn't working label Dec 5, 2024
@wez
Copy link
Collaborator

wez commented Dec 5, 2024

@lianatech-jutaky
Copy link
Author

Can you provide an example assuming the Campaign ids are dynamic GUIDs which come and go as customers create campaigns and Kumo has a single tenant?

To me queue based max_message_rate is kind of a hack to simulate priorities.

Maybe weight based popping where each scheduled queue has a default weight of 1. Like the source weight for ready queue dispatching.

That should solve the problem of only one queue being popped without any rate haxes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

2 participants