Skip to content

Introduce MPTCP #1811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 15, 2025
Merged

Introduce MPTCP #1811

merged 11 commits into from
Apr 15, 2025

Conversation

pizhenwei
Copy link
Contributor

@pizhenwei pizhenwei commented Mar 3, 2025

Multipath TCP (MPTCP) is an extension of the standard TCP protocol that allows a single transport connection to use multiple network interfaces or paths. MPTCP is useful for applications like bandwidth aggregation, failover, and more resilient connections.

Linux kernel starts to support MPTCP since v5.6, it's time to support it.

The test report shows that MPTCP reduces latency by ~25% in a 1% networking packet drop environment.

Thanks to Matthieu Baerts [email protected] for lots of review suggestions.

Proposed-by: Geliang Tang [email protected]
Tested-by: Gang Yan [email protected]
Signed-off-by: zhenwei pi [email protected]
Signed-off-by: zhenwei pi [email protected]

Cc Linux kernel MPTCP maintainer @matttbe

Multipath TCP (MPTCP) is an extension of the standard TCP protocol that allows
a single transport connection to use multiple network interfaces or paths.
MPTCP is useful for applications like bandwidth aggregation, failover, and more
resilient connections.

Linux kernel starts to support MPTCP since v5.6, it's time to support it.

The test report shows that MPTCP reduces latency by ~25% in a 1% networking
packet drop environment.

Proposed-by: Geliang Tang <[email protected]>
Tested-by: Gang Yan <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>
@pizhenwei
Copy link
Contributor Author

This PR could be tested by https://github.com/pizhenwei/valkey/tree/mptcp-with-hiredis

Copy link

codecov bot commented Mar 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.98%. Comparing base (0cc0bf7) to head (5302507).
Report is 46 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1811      +/-   ##
============================================
+ Coverage     70.87%   70.98%   +0.11%     
============================================
  Files           123      123              
  Lines         65651    65704      +53     
============================================
+ Hits          46529    46642     +113     
+ Misses        19122    19062      -60     
Files with missing lines Coverage Δ
src/anet.c 72.46% <100.00%> (+0.16%) ⬆️
src/config.c 78.43% <100.00%> (+0.04%) ⬆️
src/server.c 87.57% <100.00%> (+0.02%) ⬆️
src/server.h 100.00% <ø> (ø)

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xbasel
Copy link
Member

xbasel commented Mar 3, 2025

Not sure how useful MPTCP is for low-latency data center inner networks. It was built for mobile devices to switch between WiFi and cellular, handling packet loss.
How does MPTCP perform when there is no packet drop? Does it introduce any overhead or impact latency in such stable network conditions?

I do see the benefit of bandwidth aggregation though (which can be fixed externally, eg. openmptcprouter)

@pizhenwei
Copy link
Contributor Author

Not sure how useful MPTCP is for low-latency data center inner networks. It was built for mobile devices to switch between WiFi and cellular, handling packet loss. How does MPTCP perform when there is no packet drop? Does it introduce any overhead or impact latency in such stable network conditions?

I do see the benefit of bandwidth aggregation though (which can be fixed externally, eg. openmptcprouter)

MPTCP leads additional ~10% latency on Linux-6.11 from test over no packet drop stable network, so it is NOT suitable for any client/replication. If server listens on MPTCP, and the client side uses TCP, the connection will be TCP only. This means the client/replication side has a chance to decide TCP/MPTCP..

Hi @matttbe, is it possible to improve the performance of MPTCP as fast as TCP?

@matttbe
Copy link

matttbe commented Mar 3, 2025

Hello,

Not sure how useful MPTCP is for low-latency data center inner networks. It was built for mobile devices to switch between WiFi and cellular, handling packet loss. How does MPTCP perform when there is no packet drop? Does it introduce any overhead or impact latency in such stable network conditions?

@xbasel: MPTCP can also be useful in data centre for fast recoveries when some links fail, to use the path with the lowest latency, or even to combine multiple paths. There were a few scientific papers on the subject (using MPTCP in data center). I guess with Valkey, there will be a higher focused on the latency aspect.

Please note that on the server side, supporting MPTCP is "cheap": when clients don't request to use MPTCP, server applications will receive a "plain" TCP sockets from the kernel when connections are accepted, making the performance impact minimal.

is it possible to improve the performance of MPTCP as fast as TCP?

To be able to use multiple paths for the same connection, a few bytes need to be added to the header of each TCP packet. More details here. It means that if only one path is used per connection, MPTCP cannot be as fast as TCP. But that's normal, that's the small cost to pay to leverage multiple network paths, potentially increasing throughput and reliability. Of course, if only one path is used per MPTCP connection, no need to use Multipath TCP ;)

Still, we are working on reducing the gap between TCP and MPTCP single path, e.g. this recent modification here.

@xbasel
Copy link
Member

xbasel commented Mar 3, 2025

Unrelated to this PR, any idea if its possible create subflows on the same interface ?

@matttbe
Copy link

matttbe commented Mar 3, 2025

Unrelated to this PR, any idea if its possible create subflows on the same interface ?

Yes it is. This page should explain what is possible with the default path-manager: https://www.mptcp.dev/pm.html

Some ideas:

  • the server can announce multiple addresses v4/v6, even with a port. So it can announce the same IP with different ports
  • the client can use multiple addresses, v4/v6. They can also be assigned to the same interface or not. The fullmesh mode can help to create even more paths than "supposedly" needed.
  • technically, the client could establish multiples subflows from the same source address and to the same destination using different source ports, but that seems very particular, and only the userspace PM will let you do that.

Don't hesitate to share the use-case if something is not supported ;-)

@xbasel
Copy link
Member

xbasel commented Mar 3, 2025

Some cloud providers and some internet routers enforce per-flow bandwidth limits.
MPTCP can "fix" this issue... Basically to use MPTCP as "download accelerator".

@pizhenwei pizhenwei requested review from zuiderkwast and PingXie March 4, 2025 00:31
@pizhenwei
Copy link
Contributor Author

After renaming anetTcpSetMptcp to anetTcpGetProtocol, this function gets clearer.

Thanks to @matttbe for this suggestion!

Hi @xbasel
Do you have any suggestion about this change?

@xbasel
Copy link
Member

xbasel commented Mar 4, 2025

Hi @xbasel Do you have any suggestion about this change?

No. LGTM now.

@pizhenwei
Copy link
Contributor Author

Hi @PingXie @zuiderkwast
This change is reviewed by @xbasel and Linux kernel MPTCP maintainer @matttbe , also tested by another Linux kernel MPTCP maintainer @geliangtang 's team.

Would you please take a look?

@zuiderkwast
Copy link
Contributor

I like this because it can be done without additional libraries.

Currently we're busy with the 8.1 release and I will be away for the next week. This one will have to wait to the next release. I have some questions though:

  1. Is there any harm in enabling MPTCP by default? I.e. can we skip the mptcp on/off config and just allow it to be used on supported platforms?

    As @matttbe mentioned, it falls back to plain TCP when the client doesn't explicitly request MPTCP.

    Please note that on the server side, supporting MPTCP is "cheap": when clients don't request to use MPTCP, server applications will receive a "plain" TCP sockets from the kernel when connections are accepted, making the performance impact minimal.

  2. I'm wondering how clients and servers can announce and find the peer's additional IPs and/or ports. Do we need to add more configs for this?

  3. If we need to expose multiple IPs and ports, how can we report this in commands like CLUSTER SLOTS?

@matttbe
Copy link

matttbe commented Mar 5, 2025

  1. Is there any harm in enabling MPTCP by default? I.e. can we skip the mptcp on/off config and just allow it to be used on supported platforms?
    As @matttbe mentioned, it falls back to plain TCP when the client doesn't explicitly request MPTCP.

We do recommend this, that's what GoLang is doing since v1.24 for example. But then, it is required to change the way the socket creation is handled in this PR: for the moment, there is an error if MPTCP is not supported by the kernel. Instead, if it is not possible to create an MPTCP socket, I suggest to simply retry with a plain TCP socket, then continue like before, e.g.

#ifdef IPPROTO_MPTCP
    if ((s = socket(p->ai_family, p->ai_socktype, IPPROTO_MPTCP)) == -1)
#endif
    {
        if ((s = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) == -1) continue;
    }

Note that I think it would still make sense to have an option to easily disable MPTCP support, just in case. Disabling MPTCP globally is not difficult, simply with sysctl net.mptcp.enabled=0, but it might require different permissions.

@matttbe
Copy link

matttbe commented Mar 5, 2025

@zuiderkwast : Oops, I forgot about the two other items

  1. I'm wondering how clients and servers can announce and find the peer's additional IPs and/or ports. Do we need to add more configs for this?

No need to change anything on the application side for that, the kernel will do that automatically once it has been configured, see here.

There are tools that can automatically configure the MPTCP endpoints for the kernel, e.g. NetworkManager >=1.40. If these tools are not available, this can be easily done manually thanks to the ip mptcp endpoint add <IP address> dev <interface> <type> command.

  1. If we need to expose multiple IPs and ports, how can we report this in commands like CLUSTER SLOTS?

This should not change anything compared to before. If the bind() commands restricts to a specific IP, the restriction will still be the same. If there is no such restrictions, it means the service can be reached from different IPs, like before.

If an MPTCP endpoint has been configured to accept connections to a different port, then this port will only be used for additional subflows (paths).

Of course, there are ways for the application to get info about the connection and which subflows are being used, see here.

@pizhenwei
Copy link
Contributor Author

  1. Is there any harm in enabling MPTCP by default? I.e. can we skip the mptcp on/off config and just allow it to be used on supported platforms?
    As @matttbe mentioned, it falls back to plain TCP when the client doesn't explicitly request MPTCP.

We do recommend this, that's what GoLang is doing since v1.24 for example. But then, it is required to change the way the socket creation is handled in this PR: for the moment, there is an error if MPTCP is not supported by the kernel. Instead, if it is not possible to create an MPTCP socket, I suggest to simply retry with a plain TCP socket, then continue like before, e.g.

#ifdef IPPROTO_MPTCP
    if ((s = socket(p->ai_family, p->ai_socktype, IPPROTO_MPTCP)) == -1)
#endif
    {
        if ((s = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) == -1) continue;
    }

Note that I think it would still make sense to have an option to easily disable MPTCP support, just in case. Disabling MPTCP globally is not difficult, simply with sysctl net.mptcp.enabled=0, but it might require different permissions.

Hi @matttbe

If we change valkey as this style, valkey will listen on MPTCP by default on a higher Linux kernel implicitly, and client compiled by golang v1.24 (or higher) will get a few higher latency. Maybe lots of golang users need additional work for trouble shoot to find the root cause. So I prefer explicit mptcp config for the server side, the client side also uses explicit option. Valkey clients usually access server in a good networking environment, however replica may access it across data center for data backup, enabling/disabling global MPTCP kernel config is not the key idea of this PR.

+------+                     +-------+ 
|valkey| --MPTCP across DC-- |replica| 
+------+                     +-------+
   |
  TCP
   |
 client

@matttbe
Copy link

matttbe commented Mar 5, 2025

  1. Is there any harm in enabling MPTCP by default? I.e. can we skip the mptcp on/off config and just allow it to be used on supported platforms?
    As @matttbe mentioned, it falls back to plain TCP when the client doesn't explicitly request MPTCP.

We do recommend this, that's what GoLang is doing since v1.24 for example. But then, it is required to change the way the socket creation is handled in this PR: for the moment, there is an error if MPTCP is not supported by the kernel. Instead, if it is not possible to create an MPTCP socket, I suggest to simply retry with a plain TCP socket, then continue like before, e.g.

#ifdef IPPROTO_MPTCP
    if ((s = socket(p->ai_family, p->ai_socktype, IPPROTO_MPTCP)) == -1)
#endif
    {
        if ((s = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) == -1) continue;
    }

Note that I think it would still make sense to have an option to easily disable MPTCP support, just in case. Disabling MPTCP globally is not difficult, simply with sysctl net.mptcp.enabled=0, but it might require different permissions.

Hi @matttbe

If we change valkey as this style, valkey will listen on MPTCP by default on a higher Linux kernel implicitly, and client compiled by golang v1.24 (or higher) will get a few higher latency. Maybe lots of golang users need additional work for trouble shoot to find the root cause. So I prefer explicit mptcp config for the server side, the client side also uses explicit option. Valkey clients usually access server in a good networking environment, however replica may access it across data center for data backup, enabling/disabling global MPTCP kernel config is not the key idea of this PR.

With GoLang 1.24, only the server side is using MPTCP by default, not the client side. That's good to have servers supporting MPTCP by default, so the clients, the ones who usually benefits the most from MPTCP, can use it if they want to.

The latency is not supposed to increase with MPTCP, it should even decrease when it is possible to pick a path with a lower latency or less loaded. If the latency increases significantly with MPTCP single path compared to TCP, that's not normal. In this case, please report a new issue on our side.

@Dwyane-Yan
Copy link

Based on the test results, if MPTCP connection is used by default, it may incur some performance loss.
Below are some simple test results using valkey-benchmark:

                Origin                  MPTCP
             rps  |  avg_msec        rps  |  avg_msec
ping      132363.6|  1.494        110492.4|  1.793
set       11013.0 |  18.152       10747.1 |  18.592
get       11371.9 |  17.568       11090.4 |  18.021

The above test results were obtained from tests conducted in a virtual environment set up using a script.
Link: https://github.com/Dwyane-Yan/mptcp_net-next/blob/export/tools/testing/selftests/net/mptcp/mptcp_valkey.sh.
The configuration of this script is a little complex:

        1、Requires the Linux tools/testing/selftests environment.
        2、cd to .tools/testing/selftests/net/mptcp
        3、make
        4、cp [valkey-server/valkey-benchmark/valkey-cli(with mptcp)] ./
        5、sudo ./mptcp_valkey.sh

valkey-server 、valkey-benchmark、valkey-cli is compiled in https://github.com/pizhenwei/valkey/tree/mptcp-with-hiredis

This script has parameters:
-l represents a lossy network environment, and -m represents MPTCP subflows.

Usage example:
sudo ./mptcp-valkey.sh -m 2 # Represents an environment with 2 network interfaces.
The above test results were obtained using:

sudo ./mptcp-valkey.sh
sudo ./mptcp-valkey.sh -m 1,

which simulates a scenario where only one network interface exists using MPTCP.

By the way, there is a simpler testing method here: using 'mptcpize run' for testing.
The mptcpize is a tool that can force transfer the TCP socket into MPTCP.
This eliminates the need to set up additional environments. You can easily obtain test results by simply
running 'mptcpize run ./valkey-server' and 'mptcpize run ./valkey-benchmark -h XXX -d 1024 -c 200 -n 10000000 -t ping'.
Similarly, based on this testing in a real-world environment, the results also prove that there is a slight
performance decrease when using MPTCP with only one network interface.

Thanks the community for attention to MPTCP. Hope we can work together to make this happen.

@matttbe
Copy link

matttbe commented Mar 6, 2025

Similarly, based on this testing in a real-world environment, the results also prove that there is a slight performance decrease when using MPTCP with only one network interface.

@Dwyane-Yan Thank you for the tests and sharing the results! Do you have more details about the benchmark tool? It looks like it will create many "small" requests (1KB), but will it create one connection per request, or one (or a few) connections doing many requests?

Because you already contributed to MPTCP in the Linux kernel, maybe would like to analyse how the kernel is behaving with TCP and MPTCP? It would be really good to analyse the performances in this use-case -- e.g. with perf, maybe with some flamegraphes? -- and identify where are the major differences between TCP and MPTCP single path.

If that's OK, can we move this discussion to a new issue on MPTCP side?

@pizhenwei pizhenwei requested a review from matttbe March 10, 2025 08:45
@pizhenwei
Copy link
Contributor Author

I'm also fine with server.mptcp.

@xbasel When you said "Why not use server.mptcp?", you were referring to the variable in the code, not the config name? Configs with a dots in their names are for modules, i.e. module-name.config-name. I guess you meant to reuse the config mptcp for incoming client connections and for outgoing replication connections.

I believe it should be two separate configs. I expect users want to use replication without MPTCP for replicas within a data center with very reliable networking, while still accepting incoming MPTCP connections. You'd enable repl-mptcp on a replicas that are located in a different data center, a different continent, etc.

We had a high level consensus to add the feature. Only the details about config name and the default value were postponed. @madolson @PingXie do you think we can settle on the current state of this PR, mptcp with no by default?

Future follow-ups:

  • repl-mptcp to enable MPTCP for outgoing replication connections
  • Support MPTCP in valkey-cli and valkey-benchmark, when we have replaced hiredis with libvalkey

I create MPTCP support PR for libvalkey, please take a look.

@xbasel
Copy link
Member

xbasel commented Apr 8, 2025

Unless Valkey is running on hosts with multi-homing, MPTCP behaves just like regular TCP - only one subflow gets used. So in many setups, there's no downside.

Wrong. Each TCP header gets some extra bytes. Please read the comments in this issue such as this one: #1811 (comment)

Yeah, I was wrong - I overlooked the protocol level overhead (and for some reason assumed MPTCP would magically disable itself if only one path/interface exists).
I ran tests and saw non-negligible latency and higher sys CPU even with a single flow...

I think it should be disabled by default.

@matttbe
Copy link

matttbe commented Apr 8, 2025

Unless Valkey is running on hosts with multi-homing, MPTCP behaves just like regular TCP - only one subflow gets used. So in many setups, there's no downside.

Wrong. Each TCP header gets some extra bytes. Please read the comments in this issue such as this one: #1811 (comment)

Yeah, I was wrong - I overlooked the protocol level overhead (and for some reason assumed MPTCP would magically disable itself if only one path/interface exists). I ran tests and saw non-negligible latency and higher sys CPU even with a single flow...

Out of curiosity, what kind of test was it? Only small connections? With which kernel version?

@madolson
Copy link
Member

madolson commented Apr 8, 2025

We had a high level consensus to add the feature. Only the details about config name and the default value were postponed. @madolson @PingXie do you think we can settle on the current state of this PR, mptcp with no by default?

Yes, all my followup questions have so far been answered in this thread. The default of no does seem correct.

@hwware
Copy link
Member

hwware commented Apr 8, 2025

The performance issue #1811 (comment) is solved? @Dwyane-Yan @matttbe I have a little bit worry about it or I missed some threads?

@matttbe
Copy link

matttbe commented Apr 9, 2025

Just to make sure we don't mix up things here:

  • It can be interesting to enable MPTCP by default on the server side, to let the clients decide to use MPTCP or not. In terms of performances, the impact should not be noticeable: if the clients don't request to use MPTCP, the application on the server side will receive a "plain" TCP socket like before. More details here.
    • If MPTCP is enabled by default, it makes sense not to fail if it is not possible to create an MPTCP listening socket, and fallback to TCP automatically.
    • Of course, it might make sense to first introduce the option, then change the default later to play it safer. That's what they did in Go where MPTCP is now enabled on the server side by default.
  • On the client side (or for replications in a DC), it might be better not to enable MPTCP by default, but have an option to let users enabling it when needed. It makes sense to use MPTCP when ... multiple paths are available :) (and the kernel has been informed of which IP addresses can be used with MPTCP)
  • Regarding the performances, using multiple paths typically makes less sense with short connections. Otherwise, the overhead is probably not worth it. From what I understood, the benchmarks were done with small connections. Also, performances are getting better in newer kernel versions, and something that can be improved if someone can take the time to look at that (I don't know if @Dwyane-Yan will look at that). I don't think it should block this PR. In some cases, a small performance impact can be justified by an improvement of the resilience.

@xbasel
Copy link
Member

xbasel commented Apr 10, 2025

Out of curiosity, what kind of test was it? Only small connections? With which kernel version?

@matttbe

I used sockperf (tcp). default size 65507.

Summary:

Metric TCP MPTCP (1 flow) % Slower (MPTCP)
Avg Latency (µs) 50.896 57.124 +12.25%
Max Latency (µs) 134.017 187.902 +40.17%
p50 (Median) 50.491 56.836 +12.56%
p99 57.549 64.794 +12.59%
Messages/sec 9819.3 8749.1 -10.91%

Perf suggests these may contribute to higher latency:

__lock_sock_fast
 lock_sock_nested
mptcp_sendmsg
mptcp_recvmsg

Runs on kernel 6.5.0-1018, on ARM.

Do these numbers make sense ?

Output:

**TCP**
sockperf: == version #3.7-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 172.31.22.130   PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.000 sec; Warm up time=400 msec; SentMessages=98194; ReceivedMessages=98193
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=10.000 sec; SentMessages=98193; ReceivedMessages=98193
sockperf: ====> avg-latency=50.896 (std-dev=2.003)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 50.896 usec
sockperf: Total 98193 observations; each percentile contains 981.93 observations
sockperf: ---> <MAX> observation =  134.017
sockperf: ---> percentile 99.999 =  128.781
sockperf: ---> percentile 99.990 =  105.104
sockperf: ---> percentile 99.900 =   71.896
sockperf: ---> percentile 99.000 =   57.549
sockperf: ---> percentile 90.000 =   52.452
sockperf: ---> percentile 75.000 =   51.316
sockperf: ---> percentile 50.000 =   50.491
sockperf: ---> percentile 25.000 =   49.925
sockperf: ---> <MIN> observation =   48.137


**MPTCP with single flow**
sockperf: == version #3.7-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 172.31.22.130   PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.000 sec; Warm up time=400 msec; SentMessages=87492; ReceivedMessages=87491
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=10.000 sec; SentMessages=87491; ReceivedMessages=87491
sockperf: ====> avg-latency=57.124 (std-dev=2.674)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 57.124 usec
sockperf: Total 87491 observations; each percentile contains 874.91 observations
sockperf: ---> <MAX> observation =  187.902
sockperf: ---> percentile 99.999 =  167.123
sockperf: ---> percentile 99.990 =  138.633
sockperf: ---> percentile 99.900 =   88.516
sockperf: ---> percentile 99.000 =   64.794
sockperf: ---> percentile 90.000 =   58.810
sockperf: ---> percentile 75.000 =   57.703
sockperf: ---> percentile 50.000 =   56.836
sockperf: ---> percentile 25.000 =   56.025
sockperf: ---> <MIN> observation =   53.679

@matttbe
Copy link

matttbe commented Apr 10, 2025

@xbasel thank you for sharing this. I quickly tried in a VM using a kernel v6.15-rc0 with localhost and I don't see such differences when running the ping-pong test a few time, with and without MPTCP (single path).

According to what perf told you, the improvement might be due to some recent improvements on MPTCP side trying to improve the performances with a single path. That's getting better :)

root@mptcpdev# mptcpize run sockperf ping-pong -i 127.0.0.1 -p 5001 --tcp -m 65507 -t 10
sockperf: == version #3.7-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 127.0.0.1       PORT =  5001 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.000 sec; Warm up time=400 msec; SentMessages=145841; ReceivedMessages=145840
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=9.550 sec; SentMessages=138507; ReceivedMessages=138507
sockperf: ====> avg-latency=34.443 (std-dev=4.053)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 34.443 usec
sockperf: Total 138507 observations; each percentile contains 1385.07 observations
sockperf: ---> <MAX> observation =   71.974
sockperf: ---> percentile 99.999 =   71.212
sockperf: ---> percentile 99.990 =   59.751
sockperf: ---> percentile 99.900 =   46.406
sockperf: ---> percentile 99.000 =   41.442
sockperf: ---> percentile 90.000 =   37.610
sockperf: ---> percentile 75.000 =   35.265
sockperf: ---> percentile 50.000 =   34.659
sockperf: ---> percentile 25.000 =   34.108
sockperf: ---> <MIN> observation =   14.016
root@mptcpdev# mptcpize run sockperf ping-pong -i 127.0.0.1 -p 5001 --tcp -m 65507 -t 10
sockperf: == version #3.7-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 127.0.0.1       PORT =  5001 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.000 sec; Warm up time=400 msec; SentMessages=156938; ReceivedMessages=156937
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=9.550 sec; SentMessages=148166; ReceivedMessages=148166
sockperf: ====> avg-latency=32.180 (std-dev=5.986)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 32.180 usec
sockperf: Total 148166 observations; each percentile contains 1481.66 observations
sockperf: ---> <MAX> observation =  131.405
sockperf: ---> percentile 99.999 =   77.499
sockperf: ---> percentile 99.990 =   62.962
sockperf: ---> percentile 99.900 =   48.165
sockperf: ---> percentile 99.000 =   43.806
sockperf: ---> percentile 90.000 =   36.322
sockperf: ---> percentile 75.000 =   34.925
sockperf: ---> percentile 50.000 =   33.667
sockperf: ---> percentile 25.000 =   31.428
sockperf: ---> <MIN> observation =   14.071
root@mptcpdev# sockperf ping-pong -i 127.0.0.1 -p 5001 --tcp -m 65507 -t 10
sockperf: == version #3.7-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 127.0.0.1       PORT =  5001 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.000 sec; Warm up time=400 msec; SentMessages=143223; ReceivedMessages=143222
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=9.550 sec; SentMessages=133718; ReceivedMessages=133718
sockperf: ====> avg-latency=35.676 (std-dev=1.916)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 35.676 usec
sockperf: Total 133718 observations; each percentile contains 1337.18 observations
sockperf: ---> <MAX> observation =   80.125
sockperf: ---> percentile 99.999 =   79.564
sockperf: ---> percentile 99.990 =   57.668
sockperf: ---> percentile 99.900 =   45.900
sockperf: ---> percentile 99.000 =   41.863
sockperf: ---> percentile 90.000 =   38.276
sockperf: ---> percentile 75.000 =   35.772
sockperf: ---> percentile 50.000 =   35.246
sockperf: ---> percentile 25.000 =   34.810
sockperf: ---> <MIN> observation =   29.435
root@mptcpdev# sockperf ping-pong -i 127.0.0.1 -p 5001 --tcp -m 65507 -t 10
sockperf: == version #3.7-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 127.0.0.1       PORT =  5001 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.000 sec; Warm up time=400 msec; SentMessages=148342; ReceivedMessages=148341
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=9.550 sec; SentMessages=139163; ReceivedMessages=139163
sockperf: ====> avg-latency=34.278 (std-dev=5.977)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 34.278 usec
sockperf: Total 139163 observations; each percentile contains 1391.63 observations
sockperf: ---> <MAX> observation =  227.885
sockperf: ---> percentile 99.999 =  103.828
sockperf: ---> percentile 99.990 =   60.973
sockperf: ---> percentile 99.900 =   46.296
sockperf: ---> percentile 99.000 =   42.253
sockperf: ---> percentile 90.000 =   38.456
sockperf: ---> percentile 75.000 =   36.052
sockperf: ---> percentile 50.000 =   35.491
sockperf: ---> percentile 25.000 =   34.890
sockperf: ---> <MIN> observation =   12.513

@PingXie
Copy link
Member

PingXie commented Apr 14, 2025

nice to see the single subflow optimization, @matttbe!

  1. does the payload size still matter with the single subflow optimization? in other words, will we see higher overhead with a smaller payload size? 64KB is quite big in the Valkey world.

  2. given that the patch was submitted very recently and @xbasel's test results, I think it is still reasonable to disable MPTCP by default for now. we can always revisit this decision in the future.

thoughts?

@madolson madolson added major-decision-approved Major decision approved by TSC team and removed major-decision-pending Major decision pending by TSC team labels Apr 14, 2025
@zuiderkwast zuiderkwast added the release-notes This issue should get a line item in the release notes label Apr 15, 2025
@zuiderkwast zuiderkwast merged commit 4a92db9 into valkey-io:unstable Apr 15, 2025
50 of 51 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Valkey 9.0 Apr 15, 2025
pizhenwei added a commit to pizhenwei/valkey that referenced this pull request Apr 15, 2025
Since commit 4a92db9("Introduce MPTCP (valkey-io#1811)"), valkey server
starts to support MPTCP. Support MPTCP for replica as client side.

Signed-off-by: zhenwei pi <[email protected]>
@pizhenwei pizhenwei deleted the mptcp branch April 15, 2025 09:45
nitaicaro pushed a commit to nitaicaro/valkey that referenced this pull request Apr 22, 2025
Multipath TCP (MPTCP) is an extension of the standard TCP protocol that
allows a single transport connection to use multiple network interfaces
or paths. MPTCP is useful for applications like bandwidth aggregation,
failover, and more resilient connections.

Linux kernel starts to support MPTCP since v5.6, it's time to support
it.

The test report shows that MPTCP reduces latency by ~25% in a 1%
networking packet drop environment.

Thanks to Matthieu Baerts <[email protected]> for lots of review
suggestions.

Proposed-by: Geliang Tang <[email protected]>
Tested-by: Gang Yan <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>

Cc Linux kernel MPTCP maintainer @matttbe

Signed-off-by: Nitai Caro <[email protected]>
nitaicaro pushed a commit to nitaicaro/valkey that referenced this pull request Apr 22, 2025
Multipath TCP (MPTCP) is an extension of the standard TCP protocol that
allows a single transport connection to use multiple network interfaces
or paths. MPTCP is useful for applications like bandwidth aggregation,
failover, and more resilient connections.

Linux kernel starts to support MPTCP since v5.6, it's time to support
it.

The test report shows that MPTCP reduces latency by ~25% in a 1%
networking packet drop environment.

Thanks to Matthieu Baerts <[email protected]> for lots of review
suggestions.

Proposed-by: Geliang Tang <[email protected]>
Tested-by: Gang Yan <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>

Cc Linux kernel MPTCP maintainer @matttbe
nitaicaro pushed a commit to nitaicaro/valkey that referenced this pull request Apr 22, 2025
Multipath TCP (MPTCP) is an extension of the standard TCP protocol that
allows a single transport connection to use multiple network interfaces
or paths. MPTCP is useful for applications like bandwidth aggregation,
failover, and more resilient connections.

Linux kernel starts to support MPTCP since v5.6, it's time to support
it.

The test report shows that MPTCP reduces latency by ~25% in a 1%
networking packet drop environment.

Thanks to Matthieu Baerts <[email protected]> for lots of review
suggestions.

Proposed-by: Geliang Tang <[email protected]>
Tested-by: Gang Yan <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>

Cc Linux kernel MPTCP maintainer @matttbe
pizhenwei added a commit to pizhenwei/valkey that referenced this pull request Apr 23, 2025
Since commit 4a92db9("Introduce MPTCP (valkey-io#1811)"), valkey server
starts to support MPTCP. Support MPTCP for replica as client side.

Signed-off-by: zhenwei pi <[email protected]>
hwware pushed a commit to wuranxx/valkey that referenced this pull request Apr 24, 2025
Multipath TCP (MPTCP) is an extension of the standard TCP protocol that
allows a single transport connection to use multiple network interfaces
or paths. MPTCP is useful for applications like bandwidth aggregation,
failover, and more resilient connections.

Linux kernel starts to support MPTCP since v5.6, it's time to support
it.

The test report shows that MPTCP reduces latency by ~25% in a 1%
networking packet drop environment.

Thanks to Matthieu Baerts <[email protected]> for lots of review
suggestions.

Proposed-by: Geliang Tang <[email protected]>
Tested-by: Gang Yan <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>

Cc Linux kernel MPTCP maintainer @matttbe

Signed-off-by: hwware <[email protected]>
zuiderkwast pushed a commit that referenced this pull request Apr 27, 2025
Allow replicas to use MPTCP in the outgoing replication connection.

A new yes/no config is introduced `repl-mptcp`, default `no`.

For MPTCP to be used in replication, the primary needs to be configured
with `mptcp yes` and the replica with `repl-mptcp yes`. Otherwise, the
connection falls back to regular TCP.

Follow-up of #1811.

---------

Signed-off-by: zhenwei pi <[email protected]>
SoftlyRaining pushed a commit to SoftlyRaining/valkey that referenced this pull request May 14, 2025
Allow replicas to use MPTCP in the outgoing replication connection.

A new yes/no config is introduced `repl-mptcp`, default `no`.

For MPTCP to be used in replication, the primary needs to be configured
with `mptcp yes` and the replica with `repl-mptcp yes`. Otherwise, the
connection falls back to regular TCP.

Follow-up of valkey-io#1811.

---------

Signed-off-by: zhenwei pi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-approved Major decision approved by TSC team release-notes This issue should get a line item in the release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants