Re: amd iommu: rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-.... } 8 jiffies s: 113 root: 0x1/.

From: Borislav Petkov

Date: Wed Dec 03 2025 - 07:44:41 EST


On Fri, Nov 28, 2025 at 12:28:34PM -0800, Paul E. McKenney wrote:
> Sorry to be slow, USA Turkey Day and all that...

Nothing to be sorry for - email is asynchronous communication. :-P

> This one of course is a stall on CPU 0. But you knew that already.
>
> Also, it looks like you have CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20 or maybe
> booted with rcupdate.rcu_exp_cpu_stall_timeout=20 on a system with HZ=250?
> Or set rcu_exp_cpu_stall_timeout=20 via sysfs?

Not really - this is me simply doing "make olddefconfig" on a .config and then
using it on the test box. I'm simply doing defaults and I can imagine they
have changed over the years.

[boris@zn: ~/kernel/configs/brent> grep CONFIG_RCU_EXP_CPU_STALL_TIMEOUT config-6.18.0-rc7+
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20

Yap, 20 it is.

> This is the beginning of the message.
>
> > [ 6.971581] Key type fscrypt-provisioning registered
> > [ 6.975191] PM: Image not found (code -6)
> > [ 6.975631] } 8 jiffies s: 89 root: 0x0/.
>
> And this is the end. This looks like the stall ended just as the
> stall-warning message started printing.

I suspected that... Judging by your explanation, I don't think we can stop
printing empty stall messages - sounds like they're multiline and that comes
from different places in the code.

To avoid confusion, I mean...

> It also looks like you have the expedited stall warning set to 20
> milliseconds, which as far as I know is used only on constrained systems
> such as smartphones.

That "smartphone" can't possibly fit in my pocket! :-P :-P

> If you set this value on a typical large server, you will get very large
> numbers of expedited RCU CPU stall warnings.

Should I reset it to its default 0?

And for that other value I have there:

config RCU_CPU_STALL_TIMEOUT
int "RCU CPU stall timeout in seconds"
depends on RCU_STALL_COMMON
range 3 300
default 21

which is weird. I guess I need to reset all those to something sensible for
server...

> Oh, and if you are running with HZ=1000 and the expedited RCU CPU stall
> warning set to 20 milliseconds (let alone 8!), then as far as I know,
> you are a pioneer breaking new ground. ;-)

I do things like that from tim to time...

But nah, it is 250:

# CONFIG_HZ_PERIODIC is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette