Re: [PATCH net-next v3] inet: add ip_local_port_step_width sysctl to improve port usage distribution
From: Fernando Fernandez Mancera
Date: Wed Mar 04 2026 - 04:58:11 EST
On 3/4/26 8:05 AM, Eric Dumazet wrote:
On Tue, Mar 3, 2026 at 6:30 PM Fernando Fernandez Mancera
<fmancera@xxxxxxx> wrote:
With the current port selection algorithm, ports after a reserved port
range or long time used port are used more often than others [1]. This
causes an uneven port usage distribution. This combines with cloud
environments blocking connections between the application server and the
database server if there was a previous connection with the same source
port, leading to connectivity problems between applications on cloud
environments.
The real issue here is that these firewalls cannot cope with
standards-compliant port reuse. This is a workaround for such situations
and an improvement on the distribution of ports selected.
The proposed solution is to implement a variant of RFC 6056 Algorithm 5.
The step size is selected randomly on every connect() call ensuring it
is a coprime with respect to the size of the range of ports we want to
scan. This way, we can ensure that all ports within the range are
scanned before returning an error. To enable this algorithm, the user
must configure the new sysctl option "net.ipv4.ip_local_port_step_width".
In addition, on graphs generated we can observe that the distribution of
source ports is more even with the proposed approach. [2]
[1] https://0xffsoftware.com/port_graph_current_alg.html
[2] https://0xffsoftware.com/port_graph_random_step_alg.html
Signed-off-by: Fernando Fernandez Mancera <fmancera@xxxxxxx>
---
v2: used step to calculate remaining as (remaining / step) and avoid
calculating gcd when scan_step and range are both even
v3: xmas tree formatting and break the gdc() loop once scan_step is 1
---
Documentation/networking/ip-sysctl.rst | 9 ++++++
.../net_cachelines/netns_ipv4_sysctl.rst | 1 +
include/net/netns/ipv4.h | 1 +
net/ipv4/inet_hashtables.c | 28 +++++++++++++++++--
net/ipv4/sysctl_net_ipv4.c | 7 +++++
5 files changed, 43 insertions(+), 3 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 265158534cda..da29806700e9 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1630,6 +1630,15 @@ ip_local_reserved_ports - list of comma separated ranges
Default: Empty
+ip_local_port_step_width - INTEGER
+ Defines the numerical maximum increment between successive port
+ allocations within the ephemeral port range when an unavailable port is
+ reached. This can be used to mitigate accumulated nodes in port
+ distribution when reserved ports have been configured. Please note that
+ port collisions may be more frequent in a system with a very high load.
+
Patch SGTM, but I find this documentation obscure.
Some guidance would be nice. What values have you tested/tried ?
As I am working on a patch series with improvements to ip-sysctl.rst documentation I will handle that there.
FTR; I tested multiple scenarios and numbers. If the value is >= the whole range, the issue is always mitigated but of course this will have a hit on performance under port exhaustion situation. The value that works better in my experience is 2x 3x or even 4x the size of the largest reserved block. If only a couple of ports are marked as reserved, 128 is usually enough to avoid clustering..
Thank you all for the reviews!
Reviewed-by: Eric Dumazet <edumazet@xxxxxxxxxx>