Re: [net,PATCH v2] net: ks8851: Reinstate disabling of BHs around IRQ handler

From: Jakub Kicinski

Date: Sun Apr 12 2026 - 13:51:45 EST


On Sun, 12 Apr 2026 18:27:28 +0200 Marek Vasut wrote:
> On 4/12/26 6:01 PM, Jakub Kicinski wrote:
> > On Wed, 8 Apr 2026 18:24:58 +0200 Marek Vasut wrote:
> >> If CONFIG_PREEMPT_RT=y is set AND the driver executes ks8851_irq() AND
> >> KSZ_ISR register bit IRQ_RXI is set AND ks8851_rx_pkts() detects that
> >> there are packets in the RX FIFO, then netdev_alloc_skb_ip_align() is
> >> called to allocate SKBs. If netdev_alloc_skb_ip_align() is called with
> >> BH enabled, local_bh_enable() at the end of netdev_alloc_skb_ip_align()
> >> will call __local_bh_enable_ip(), which will call __do_softirq(), which
> >> may trigger net_tx_action() softirq, which may ultimately call the xmit
> >> callback ks8851_start_xmit_par(). The ks8851_start_xmit_par() will try
> >> to lock struct ks8851_net_par .lock spinlock, which is already locked
> >> by ks8851_irq() from which ks8851_start_xmit_par() was called. This
> >> leads to a deadlock, which is reported by the kernel, including a trace
> >> listed below.
> >
> > lock_par is a spinlock, and AFAIU softirqs run in their on thread on RT.
> > I'm not following.
>
> Please look at the backtrace in the commit message, this part, please
> read from bottom to top to observe the failure in chronological order.
> It does not seem the handle_softirqs() is running in its own thread,
> separate from the IRQ thread ?
>
> rt_spin_lock from ks8851_start_xmit_par+0x68/0x1a0
> ks8851_start_xmit_par from netdev_start_xmit+0x1c/0x40 <---- this
> tries to grab the same PAR spinlock, and deadlocks
> netdev_start_xmit from dev_hard_start_xmit+0xec/0x1b0
> dev_hard_start_xmit from sch_direct_xmit+0xb8/0x25c
> sch_direct_xmit from __qdisc_run+0x20c/0x4fc
> __qdisc_run from qdisc_run+0x1c/0x28
> qdisc_run from net_tx_action+0x1f4/0x244
> net_tx_action from handle_softirqs+0x1c0/0x29c
> handle_softirqs from __local_bh_enable_ip+0xdc/0xf4
> __local_bh_enable_ip from __netdev_alloc_skb+0x140/0x194
> __netdev_alloc_skb from ks8851_irq+0x348/0x4d8 <---- this is called
> from ks8851_rx_pkts() via netdev_alloc_skb_ip_align()
> ks8851_irq from irq_thread_fn+0x24/0x64 <-------- this here runs with
> the PAR spinlock held
>
> > The patch looks way to "advanced" for a driver. Something is going
> > very wrong here. Or the commit message must be updated to explain
> > it better to people like me. Or both.
>
> Does the backtrace make the problem clearer, with the annotation above ?

Sebastian, do you have any recommendation here? tl;dr is that the driver does

spin_lock_irqsave()
__netdev_alloc_skb()
spin_unlock_irqrestore()

And __netdev_alloc_skb() does:

if (in_hardirq() || irqs_disabled()) {
nc = this_cpu_ptr(&netdev_alloc_cache);
data = page_frag_alloc(nc, len, gfp_mask);
pfmemalloc = page_frag_cache_is_pfmemalloc(nc);
} else {
local_bh_disable();
local_lock_nested_bh(&napi_alloc_cache.bh_lock);

nc = this_cpu_ptr(&napi_alloc_cache.page);
data = page_frag_alloc(nc, len, gfp_mask);
pfmemalloc = page_frag_cache_is_pfmemalloc(nc);

local_unlock_nested_bh(&napi_alloc_cache.bh_lock);
local_bh_enable();
}

the local_bh_enable() seems to kick in BH processing inline,
and BH processing takes the same spin lock the driver is already
holding.