Re: [BUG v6.3-rc4+] WARNING: CPU: 0 PID: 1 at drivers/thermal/thermal_sysfs.c:879 cooling_device_stats_setup+0xac/0xc0

From: Linus Torvalds
Date: Wed Mar 29 2023 - 18:53:17 EST


On Wed, Mar 29, 2023 at 1:58 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> In preparation to adding my patch that checks for some kinds of bugs in
> trace events, I decided to run it on the Linus's latest branch, to see if
> there's any other trace events that may cause issues. But instead I hit
> this unrelated bug. Looks to be triggering an added lockdep_assert() on
> boot up.

So I think that lockdep assert is likely bogus.

It was added in commit 790930f44289 ("thermal: core: Introduce
thermal_cooling_device_update()") but the reason I say it's bogus is
that I don't think it has ever been tested:

> static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
> {
> lockdep_assert_held(&cdev->lock); <<<---- line 879

Yeah, so cooling_device_stats_setup() is called from two places:

- thermal_cooling_device_setup_sysfs()

- thermal_cooling_device_stats_reinit()

and that first place is when that cdev is created, before it's
registered anywhere. It's not locked in that case, and yes, the
lockdep_assert_held() will trigger.

As far as I can tell it will always trigger, and this lockdep_assert()
has thus never been tested with lockdep enabled.

The "stats_reinit" case seems to also be called from only one place
(thermal_cooling_device_update()), and that path does indeed hold the
cdev->lock.

That lockdep could be made happy by having
thermal_cooling_device_setup_sysfs() create that device with the cdev
lock held. I guess that's easy enough, although somewhat annoyingly
there is no "mutex_init_locked()", you have to actually do
"mutex_init()" followed by a "mutex_lock()". And obviously unlock it
after doing the setup_sysfs().

But I question whether the lockdep test should be done at all. I find
it distasteful that it was added with absolutely zero testing.

Linus