Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock

From: Petr Mladek

Date: Fri Jun 19 2026 - 05:55:29 EST


On Wed 2026-06-17 19:13:30, John Ogness wrote:
> On 2026-06-17, Breno Leitao <leitao@xxxxxxxxxx> wrote:
> > On Wed, Jun 17, 2026 at 01:19:58PM +0200, Peter Zijlstra wrote:
> >> But anything using locking is not ->write_atomic() and should be driven
> >> from a kthread, no?
> >
> > Good point. If that's the case, netconsole might not ever be able to drop
> > CON_NBCON_ATOMIC_UNSAFE for any network-based console driver at all.
>
> It depends on what it needs to synchronize against. For example, the
> UART consoles cannot write if the port lock is taken by another
> context. And the port lock is the sole lock for writing to the UART. To
> deal with this, we added wrappers [0] for acquiring/releasing the port
> lock. The wrappers acquire the nbcon hardware after taking the port
> lock.
>
> The write_atomic() implementations for UART consoles do not take the
> port lock. Only the nbcon hardware is acquired (which can be done from
> any context). This automatically provides the synchronization based on
> the port lock.
>
> > As far as I can tell, there isn't a network driver today whose transmit
> > path is completely lockless, so, even if we make netpoll lockless.
> >
> > It's unlikely any NIC will ever achieve this, given that NIC TX
> > fundamentally relies on a shared DMA ring and doorbell register, which
> > inherently cannot be made lockless.
> >
> > So, is it correct to state that CON_NBCON_ATOMIC_UNSAFE will be part of
> > netconsole forever-ish?
>
> Is there some lock that can be taken to synchronize all writing of
> packets to the network? If yes, the netconsole can use a similar
> solution.

We need to be careful here. If more locks depend on the nbcon
ownership than it might become a kind of big kernel lock.

It might suffer from lock contention.

Another complication is that it is supposed to be a tail lock.

Finally, it might create tricky lockdep dependencies. But nbcon
context locking is not tracked by locked so it is not easy to be sure.

More details:

I always forget the details. But it seems that sleeping is allowed
in the nbcon context, see cant_migrate() in nbcon_device_try_acquire().
Which might break when someone tries to take it in atomic context.

AFAIK, the motivation was to allow using the normal (sleeping)
spin locks for serial console synchronization in RT. The nested nbcon
context locking should not disable the preemption when called
in NBCON_PRIO_NORMAL context.

It would still allow to take the nbcon context in atomic context
when called in NBCON_PRIO_EMERGENCY or _PANIC context because
nbcon_context_try_acquire() is able to take over the ownership
even from a sleeping NBCON_PRIO_NORMAL context.

But we need to make sure that outer locks behave the same.
In practice, they must be normal spin_locks. We could probably
add some lockdep annotation to catch eventual problems.

Sigh, I hope that I have got it right. I seem to be a bit lost
this week.

> [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/serial_core.h?h=v7.1#n715

Best Regards,
Petr