[PATCH rdma-next 0/9] Rework retry algorithm used when sending MADs
From: Leon Romanovsky
Date: Thu Dec 05 2024 - 08:50:44 EST
>From Vlad,
This series aims to improve behaviour of a MAD sender under congestion
and/or receiver overload. We've seen significant drops in goodput when
MAD receivers are overloaded. This typically happens with SA requests,
which are served by a single node (SM), but can also happen with CM.
Patch 7 introduces the main change: exponential backoff. This new retry
algorithm is applied to all MADs, except RMPP and OPA. To avoid
reductions in recovery speed under transient failures, the exponential
backoff algorithm only engages after a certain number of linear timeouts
is experienced. The backoff algorithm resets to beginning after a CM
MRA, assuming the remote is not longer overloaded.
Because a trade-off between speed of recovery under transient failure
and reducing load from unnecessary retries under persistent failure must
be made, and this trade-off depends on the network scale, patch 8 makes
mad-linear-timeouts configurable.
Patch 1 makes CM MRA apply only once, to prevent entering an excessive
delay condition, even when the receiver is likely no longer overloaded.
The exponential backoff algorithm (a) increases the time until a send
MAD reaches the final timeout, and (b) makes it hard to predict by
callers. Since certain callers appear to care about this, Patch 2
introduces a new option, deadline, which can be used to enforce when
the final timeout is reached. SA, UMAD and CM are updated to use this
new parameter (patches 3, 5, 6).
Patch 3 also solves a related issue in SA, which configures the MAD
layer with extremely aggressive retry intervals, in certain cases.
Because the current aggressive retry was introduced to solve another
issue, patch 4 makes sa-min-timeout configurable.
Patch 9 resolves another related issue in CM, which uses a retry
interval that is way too high for (low latency) RDMA networks.
In summary:
1) IB/mad: Apply timeout modification (CM MRA) only once
2) IB/mad: Add deadline for send MADs
3) RDMA/sa_query: Enforce min retry interval and deadline
4) RDMA/nldev: Add sa-min-timeout management attribute
5) IB/umad: Set deadline when sending non-RMPP MADs
6) IB/cm: Set deadline when sending MADs
7) IB/mad: Exponential backoff when retrying sends
8) RDMA/nldev: Add mad-linear-timeouts management attribute
9) IB/cma: Lower response timeout to roughly 1s
Two tunables will be added to RDMA tool (iproute2), under the
'management' namespace as follow-up:
mad-linear-timeouts
sa-min-timeout
Thanks
Vlad Dumitrescu (9):
IB/mad: Apply timeout modification (CM MRA) only once
IB/mad: Add deadline for send MADs
RDMA/sa_query: Enforce min retry interval and deadline
RDMA/nldev: Add sa-min-timeout management attribute
IB/umad: Set deadline when sending non-RMPP MADs
IB/cm: Set deadline when sending MADs
IB/mad: Exponential backoff when retrying sends
RDMA/nldev: Add mad-linear-timeouts management attribute
IB/cma: Lower response timeout to roughly 1s
drivers/infiniband/core/cm.c | 13 +++
drivers/infiniband/core/cma.c | 2 +-
drivers/infiniband/core/core_priv.h | 4 +
drivers/infiniband/core/mad.c | 141 ++++++++++++++++++++++++++--
drivers/infiniband/core/mad_priv.h | 8 ++
drivers/infiniband/core/nldev.c | 133 ++++++++++++++++++++++++++
drivers/infiniband/core/sa_query.c | 81 +++++++++++++---
drivers/infiniband/core/user_mad.c | 8 ++
include/rdma/ib_mad.h | 29 ++++++
include/uapi/rdma/ib_user_mad.h | 12 ++-
include/uapi/rdma/rdma_netlink.h | 7 ++
11 files changed, 416 insertions(+), 22 deletions(-)
--
2.47.0