Re: ARM board lockups/hangs triggered by locks and mutexes

From: Rafał Miłecki
Date: Wed Aug 02 2023 - 03:41:08 EST


On 2.08.2023 09:00, Rafał Miłecki wrote:
With your comment I decided to try CONFIG_PROVE_LOCKING anyway / again
and this time on 1 of my BCM53573 devices I got something very
interesting on the first boot.

FWIW following error:
Broadcom B53 (2) bcma_mdio-0-0:1e: failed to register switch: -517
is caused by invalid DT I sent fixes for just recently.

Please scroll through the first booting lines for the WARNING:

(...)
[    1.167234] bgmac_bcma bcma0:5: Found PHY addr: 30 (NOREGS)
[    1.173655] ------------[ cut here ]------------
[    1.178374] WARNING: CPU: 0 PID: 1 at kernel/locking/mutex.c:950 __mutex_lock+0x6b4/0x8a0
[    1.186721] DEBUG_LOCKS_WARN_ON(lock->magic != lock)

Ah, that mutex WARNING comes from my Tenda AC9 device which happens to
use a hacky OpenWrt downstream b53 driver. That driver uses wrong API
(it behaves as PHY driver instead of MDIO driver). It results in probing
against PHY device which isn't properly initialized.

Long story short: above WARNING is just a noise. Ignore it please. Sorry
for that.

Kernel compiled with CONFIG_PROVE_LOCKING still works fine on other
devices and on Tenda AC9 after fixing PHY<->MDIO thing. That kernel
option hides actual bug whatever it is.