Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT
From: Bert Karwatzki
Date: Tue Feb 17 2026 - 05:43:43 EST
Am Dienstag, dem 17.02.2026 um 10:57 +0100 schrieb Sebastian Andrzej Siewior:
> On 2026-02-17 09:56:48 [+0100], Bert Karwatzki wrote:
> > Am Dienstag, dem 17.02.2026 um 08:19 +0100 schrieb Sebastian Andrzej Siewior:
> > > On 2026-02-17 00:48:25 [+0100], Bert Karwatzki wrote:
> > > > The problem seems to be that different preemtible threads try to send skbs.
> > >
> > > This does not matter because the counter is per-thread not per-CPU.
> >
> > The "Dead loop on virtual device" messages is not printed because dev_xmit_recursion()
> > returns true, but because READ_ONCE(txq->xmit_lock_owner) == cpu.
>
> Ach, so it is not the recursion, it is the assigned CPU.
> This is assigned via __netif_tx_lock(). Here we somehow lack the
> expected synchronisation. So the queue should be locked but not by the
> caller.
Yes, the queue gets locked by the first thread (via HARD_TX_LOCK), then the thread gets
preempted before the processing of the skb is complete, then the next thread on the same
CPU calls __dev_queue_xmit() and find that the lockowner has the same CPU id.
I just wondered if we can completely skip the
if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
[...]
} else
{
/* "Recursion" alert */
}
check, as the synchronization will we provided by HARD_TX_{LOCK,UNLOCK}.
The comment
/* Other cpus might concurrently change txq->xmit_lock_owner
* to -1 or to their cpu id, but not to our id.
*/
suggests that the case that a thread is preempted while holding the lock was
not taken into account here. And in non-RT cases this would be correct as spin_lock()
disables preemption in that case.
Bert Karwatzki