Re: [PATCH net-next v2] netdevsim: call napi_schedule from a timer context

From: Breno Leitao
Date: Wed Feb 19 2025 - 10:37:38 EST


Hello Jakub,

On Mon, Feb 17, 2025 at 11:50:31AM -0800, Jakub Kicinski wrote:
> On Mon, 17 Feb 2025 09:35:29 -0800 Breno Leitao wrote:
> > The netdevsim driver was experiencing NOHZ tick-stop errors during packet
> > transmission due to pending softirq work when calling napi_schedule().
> > This issue was observed when running the netconsole selftest, which
> > triggered the following error message:
> >
> > NOHZ tick-stop error: local softirq work is pending, handler #08!!!
> >
> > To fix this issue, introduce a timer that schedules napi_schedule()
> > from a timer context instead of calling it directly from the TX path.
> >
> > Create an hrtimer for each queue and kick it from the TX path,
> > which then schedules napi_schedule() from the timer context.
>
> This crashes in the nl_netdev test.

Yea, a nasty crash. Looking at the crash, it seems to be disabling the
timer before initializing it, and timer->base was not properly
assigned/set.

> I think you should move the hrtimer init to nsim_queue_alloc()
> and removal to nsim_queue_free()

That seems to make nl_netdev happier. Let me do more tests, and then ask
NIPA do finish the work.

Thanks!