Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT
From: Bert Karwatzki
Date: Mon Feb 16 2026 - 18:49:02 EST
Am Montag, dem 16.02.2026 um 16:37 +0100 schrieb Sebastian Andrzej Siewior:
>
> I am not sure what issue is so I can't tell. The dev_xmit_recursion*()
> based counters are per-task so it should be fine. But yet the wifi
> managed to repeatedly enqueue packets. This might be a real recursion, a
> stack trace should tell. And then, somewhere synchronisation is missing.
>
> > Bert Karwatzki
>
> Sebastian
The problem seems to be that different preemtible threads try to send skbs.
I used this debug patch for 6.18.10:
diff --git a/net/core/dev.c b/net/core/dev.c
index 5b536860138d..ecfdd8e3dc99 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4704,6 +4704,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
qdisc_pkt_len_init(skb);
tcx_set_ingress(skb, false);
+ printk(KERN_INFO "%s 0: skb = %px dev = %s\n", __func__, skb, dev->name);
#ifdef CONFIG_NET_EGRESS
if (static_branch_unlikely(&egress_needed_key)) {
if (nf_hook_egress_active()) {
@@ -4739,10 +4740,12 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
trace_net_dev_queue(skb);
if (q->enqueue) {
+ printk(KERN_INFO "%s 1: skb = %px dev = %s\n", __func__, skb, dev->name);
rc = __dev_xmit_skb(skb, q, dev, txq);
goto out;
}
+ printk(KERN_INFO "%s 2: skb = %px dev = %s txq = %px\n", __func__, skb, dev->name, txq);
/* The device has no queue. Common case for software devices:
* loopback, all the sorts of tunnels...
@@ -4761,15 +4764,20 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
/* Other cpus might concurrently change txq->xmit_lock_owner
* to -1 or to their cpu id, but not to our id.
*/
+ printk(KERN_INFO "%s: cpu = %d xmit_lock_owner = %d\n", __func__, cpu, READ_ONCE(txq->xmit_lock_owner));
if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
- if (dev_xmit_recursion())
+ printk(KERN_INFO "%s 3: skb = %px dev = %s txq = %px\n", __func__, skb, dev->name, txq);
+ if (dev_xmit_recursion()) {
+ printk(KERN_INFO "%s: recursion alert for device %s!\n", __func__, dev->name);
goto recursion_alert;
+ }
skb = validate_xmit_skb(skb, dev, &again);
if (!skb)
goto out;
HARD_TX_LOCK(dev, txq, cpu);
+ printk(KERN_INFO "%s 4: skb = %px dev = %s txq = %px\n", __func__, skb, dev->name, txq);
if (!netif_xmit_stopped(txq)) {
dev_xmit_recursion_inc();
@@ -4777,6 +4785,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
dev_xmit_recursion_dec();
if (dev_xmit_complete(rc)) {
HARD_TX_UNLOCK(dev, txq);
+ printk(KERN_INFO "%s 5: skb = %px dev = %s txq = %px\n", __func__, skb, dev->name, txq);
goto out;
}
}
The normal path of an skb is this:
2026-02-17T00:29:24.124757+01:00 [ T1522] __dev_queue_xmit 0: skb = ffff8c11ed06bd00 dev = wlp4s0
2026-02-17T00:29:24.124845+01:00 [ T1522] __dev_queue_xmit 2: skb = ffff8c11ed06bd00 dev = wlp4s0 txq = ffff8c1145320200
2026-02-17T00:29:24.124851+01:00 [ T1522] __dev_queue_xmit: cpu = 7 xmit_lock_owner = -1
2026-02-17T00:29:24.124853+01:00 [ T1522] __dev_queue_xmit 3: skb = ffff8c11ed06bd00 dev = wlp4s0 txq = ffff8c1145320200
2026-02-17T00:29:24.124855+01:00 [ T1522] __dev_queue_xmit 4: skb = ffff8c11ed06bd00 dev = wlp4s0 txq = ffff8c1145320200
2026-02-17T00:29:24.124857+01:00 [ T1522] __dev_queue_xmit 5: skb = 0000000000000000 dev = wlp4s0 txq = ffff8c1145320200
This is the situation which produces the error messages:
T1522 tries to send an skb on CPU 7:
2026-02-17T00:29:24.212215+01:00 [ T1522] __dev_queue_xmit 0: skb = ffff8c11ed06b100 dev = wlp4s0
2026-02-17T00:29:24.212217+01:00 [ T1522] __dev_queue_xmit 2: skb = ffff8c11ed06b100 dev = wlp4s0 txq = ffff8c1145320200
2026-02-17T00:29:24.212219+01:00 [ T1522] __dev_queue_xmit: cpu = 7 xmit_lock_owner = -1
2026-02-17T00:29:24.212221+01:00 [ T1522] __dev_queue_xmit 3: skb = ffff8c11ed06b100 dev = wlp4s0 txq = ffff8c1145320200
2026-02-17T00:29:24.212223+01:00 [ T1522] __dev_queue_xmit 4: skb = ffff8c11ed06b100 dev = wlp4s0 txq = ffff8c1145320200
Here T1522 gets preempted and T1513 is executed on CPU 7 and also tries to send an skb:
2026-02-17T00:29:24.212225+01:00 [ T1513] __dev_queue_xmit 0: skb = ffff8c11ed06a300 dev = wlp4s0
2026-02-17T00:29:24.212228+01:00 [ T1513] __dev_queue_xmit 2: skb = ffff8c11ed06a300 dev = wlp4s0 txq = ffff8c1145320200
2026-02-17T00:29:24.212230+01:00 [ T1513] __dev_queue_xmit: cpu = 7 xmit_lock_owner = 7
2026-02-17T00:29:24.212231+01:00 [ T1513] Dead loop on virtual device wlp4s0, fix it urgently!
2026-02-17T00:29:24.212234+01:00 [ T1513] __dev_queue_xmit 0: skb = ffff8c11ed06a300 dev = wlp4s0
2026-02-17T00:29:24.212236+01:00 [ T1513] __dev_queue_xmit 2: skb = ffff8c11ed06a300 dev = wlp4s0 txq = ffff8c1145320200
2026-02-17T00:29:24.212238+01:00 [ T1513] __dev_queue_xmit: cpu = 7 xmit_lock_owner = 7
2026-02-17T00:29:24.212240+01:00 [ T1513] Dead loop on virtual device wlp4s0, fix it urgently!
2026-02-17T00:29:24.212242+01:00 [ T1513] __dev_queue_xmit 0: skb = ffff8c11ed06a300 dev = wlp4s0
2026-02-17T00:29:24.212244+01:00 [ T1513] __dev_queue_xmit 2: skb = ffff8c11ed06a300 dev = wlp4s0 txq = ffff8c1145320200
2026-02-17T00:29:24.212246+01:00 [ T1513] __dev_queue_xmit: cpu = 7 xmit_lock_owner = 7
2026-02-17T00:29:24.212247+01:00 [ T1513] Dead loop on virtual device wlp4s0, fix it urgently!
T1513 gets preempted and T1522 finishes processing the skb from above:
2026-02-17T00:29:24.212249+01:00 [ T1522] __dev_queue_xmit 5: skb = 0000000000000000 dev = wlp4s0 txq = ffff8c1145320200
Bert Karwatzki