Re: [PATCH net-next 2/2] net/smc: reduce TX slot contention with exclusive wait

From: Paolo Abeni

Date: Tue May 12 2026 - 04:28:20 EST


From: AI Reviewer <ai@xxxxxxxxxxx>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net/smc: reduce TX slot contention with exclusive wait

This patch replaces the non-exclusive wait_event_interruptible_timeout()
in smc_wr_tx_get_free_slot() with an open-coded exclusive wait loop via
prepare_to_wait_exclusive(), so wake_up() on a freed TX slot wakes only
a single waiter instead of the entire wait queue.

> diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
> index 48037a3d97a3..0a6f2befb0e2 100644
> --- a/net/smc/smc_wr.c
> +++ b/net/smc/smc_wr.c
> @@ -171,17 +173,27 @@ int smc_wr_tx_get_free_slot(struct smc_link *link,
> if (rc)
> return rc;
> } else {
> - rc = wait_event_interruptible_timeout(
> - link->wr_tx_wait,
> - !smc_link_sendable(link) ||
> - lgr->terminating ||
> - (smc_wr_tx_get_free_slot_index(link, &idx) != -EBUSY),
> - SMC_WR_TX_WAIT_FREE_SLOT_TIME);
> - if (!rc) {
> - /* timeout - terminate link */
> - smcr_link_down_cond_sched(link);
> - return -EPIPE;
> + rc = 0;
> + for (;;) {
> + prepare_to_wait_exclusive(&link->wr_tx_wait, &wait,
> + TASK_INTERRUPTIBLE);
> + if (!smc_link_sendable(link) || lgr->terminating ||
> + smc_wr_tx_get_free_slot_index(link, &idx) != -EBUSY)
> + break;
> + timeout = schedule_timeout(timeout);
> + if (!timeout) {
> + /* timeout - terminate link */
> + smcr_link_down_cond_sched(link);
> + break;
> + }
> + if (signal_pending(current)) {
> + rc = -ERESTARTSYS;
> + break;
> + }

[High]
With exclusive waiters, wake_up() on link->wr_tx_wait wakes only one
task per freed slot. If that waiter is about to exit via the
signal_pending() branch, can the freed slot be lost to the remaining
sleepers?

Consider two exclusive waiters A and B asleep in schedule_timeout():

1. smc_wr_tx_put_slot() -> wake_up(&link->wr_tx_wait) is effectively
__wake_up(..., nr_exclusive=1). default_wake_function makes A
runnable, autoremove_wake_function pulls A off the queue, and
__wake_up_common stops -- B is not woken.

2. A signal is delivered to A before it actually runs. Because A is
already TASK_RUNNING, signal_wake_up only sets TIF_SIGPENDING.

3. A returns from schedule_timeout() with timeout > 0, so the
!timeout branch is skipped. signal_pending(current) is true, so
the loop sets rc = -ERESTARTSYS and breaks without re-testing
smc_wr_tx_get_free_slot_index() and without issuing any wake_up()
on the way out.

4. B stays asleep on the wait queue even though the wr_tx_mask bit
for the freed slot is still available. With no further
completions, B sleeps until the full SMC_WR_TX_WAIT_FREE_SLOT_TIME
expires, at which point smcr_link_down_cond_sched() tears down an
otherwise healthy link.

The previous wait_event_interruptible_timeout() was non-exclusive, so
every waiter was woken on each wake_up() and re-evaluated the
condition, which avoided this class of lost-wakeup.

Should the signal path re-check the slot condition before returning,
or issue a wake_up() on exit so another exclusive waiter gets a chance
at the slot?

[Medium]
The commit message says this is a contention optimization and that
"teardown behavior is unchanged", but does it also change the errno
returned to callers on signal delivery?

Before this patch, wait_event_interruptible_timeout() returned
-ERESTARTSYS on signal; that value is non-zero, so the old if (!rc)
branch was skipped, control fell through to the trailing
if (idx == link->wr_tx_cnt) return -EPIPE, and callers saw -EPIPE.

After the patch, the signal path does:

if (signal_pending(current)) {
rc = -ERESTARTSYS;
break;
}
...
finish_wait(&link->wr_tx_wait, &wait);
if (rc)
return rc;

so -ERESTARTSYS propagates up through smc_cdc_get_free_slot() (which
only remaps rc to -EPIPE when conn->killed is set),
smcr_cdc_get_slot_and_msg_send(), smc_cdc_get_slot_and_msg_send() and
smc_tx_sendmsg(), meaning userspace sees syscall restart / -EINTR
instead of -EPIPE.

Note that the sibling smc_wr_reg_send() in the same file explicitly
converts -ERESTARTSYS to -EINTR. Should this return-code change be
called out in the commit message (and possibly split out), given that
the stated intent of the patch is purely to reduce contention?

[Low]
On timeout expiry the new loop takes the teardown path without
re-testing whether a slot became available:

> + timeout = schedule_timeout(timeout);
> + if (!timeout) {
> + /* timeout - terminate link */
> + smcr_link_down_cond_sched(link);
> + break;
> + }

The old wait_event_interruptible_timeout() expansion re-evaluated the
condition via ___wait_cond_timeout after schedule_timeout() returned
0, so a wake_up() racing with timer expiry could still report success
and let smc_wr_tx_get_free_slot_index() claim the slot.

In the narrow race where a slot is freed at the same instant the
timer fires, does this now tear the link down even though a slot was
actually available?

> }
> + finish_wait(&link->wr_tx_wait, &wait);
> + if (rc)
> + return rc;
> if (idx == link->wr_tx_cnt)
> return -EPIPE;
> }
--
This is an AI-generated review.