Re: Linux 6.3-rc2

From: Linus Torvalds
Date: Mon Mar 13 2023 - 14:22:49 EST


On Mon, Mar 13, 2023 at 8:53 AM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
>
> Warning backtraces in calls from ct_nmi_enter(),
> seen randomly.

Hmm.

I suspect this one is a bug in the warning, not in the kernel,
although I have no idea why it would have started happening now.

This happens from an irq event, but that check is not *supposed* to
happen at all from interrupts:

* We dont accurately track softirq state in e.g.
* hardirq contexts (such as on 4KSTACKS), so only
* check if not in hardirq contexts:

but I think that the ct_nmi_enter() function was called before the
hardirq count had even been incremented.

> Sample decoded stack trace:

Hmm. That WARNING backtrace doesn't actually seem to follow the stack
chain, so it only shows the irq stack, not where the irq happened.

> Seen if CONFIG_DEBUG_LOCK_ALLOC=y and CONFIG_CONTEXT_TRACKING_IDLE=y.
> It seems that rcu_read_lock_sched_held() can be true when entering an interrupt.
>
> The problem is not seen in v6.2, but occurs randomly on ToT with various
> arm emulations.

Strange. I must be wrong about this being a race on the warning
itself, because that warning has been there for a long long time.

Adding in some people who might have more of a clue. I'm thinking
Frederic and Paul might know what's up with the context tracking, but
I don't see why this would be arm-related or have started recently.
But I do note that PeterZ did some rcuidle tracing cleanups that do
end up affecting arm too.

So adding PeterZ too.

Original email with full details at

https://lore.kernel.org/lkml/d915df60-d06b-47d4-8b47-8aa1bbc2aac7@xxxxxxxxxxxx/

for added peeps.

Anybody?

Linus