Re: BUG: Invalid wait context at: mc146818_avoid_UIP tick_freeze

From: Mateusz Jończyk
Date: Thu Dec 26 2024 - 12:55:22 EST


W dniu 1.12.2024 o 18:05, Chris Bainbridge pisze:
> This splat happens on suspend/resume on a HP laptop. It doesn't appear
> to be a recent regression, as a bisect only leads to 560af5dc839e
> ("lockdep: Enable PROVE_RAW_LOCK_NESTING with PROVE_LOCKING.") - so
> most likely the issue has been around for a while, but a recent kernel
> build with lockdep enabled will now show it.

Hello,

Thank you for this bug report.

The cause is that timekeeping_suspend takes a raw spinlock called "tick_freeze_lock". With this lock taken, this function indirectly calls mc146818_avoid_UIP, which takes a normal spinlockcalled
"rtc_lock".

It is not permissible to take a normal spinlock while holding a raw spinlock due to issues on PREEMPT_RT kernels:

https://docs.kernel.org/locking/locktypes.html#raw-spinlock-t-on-rt

>From what I can see, this has been so for a very long time. I was able to trigger the bug on Linux 6.1.0 with CONFIG_PROVE_RAW_LOCK_NESTING enabled.

A solution to the problem would be to turn the rtc_lock into a raw spinlock. This requires that the critical section (during which the lock is held) is small. Reading full time from the RTC requires
in one critical section over 10 CMOS_READ invocations, writing full time - around 15 CMOS_READ/CMOS_WRITE invocations. This cannot really be broken down AFAIK - I hope that the critical section would
be small enough.

The rtc_lock is used on 7 architectures (mips, sparc64, powerpc, alpha, x86, arm, m68k/atari), so this will require a bit of work. I'll try and see what I'll be able to do.

Greetings & merry Christmas,

Mateusz