Re: ARM BCM53573 SoC hangs/lockups caused by locks/clock/random changes

From: Florian Fainelli
Date: Tue Sep 05 2023 - 16:07:51 EST




On 9/4/2023 8:40 AM, Russell King (Oracle) wrote:
On Mon, Sep 04, 2023 at 11:25:57AM -0400, Waiman Long wrote:

On 9/4/23 04:33, Rafał Miłecki wrote:
As those hangs/lockups are related to so many different changes it's
really hard to debug them.

This bug seems to be specific to the slow arch clock that affects
stability only when kernel locking code and symbols layout trigger some
very specific timing.

Enabling CONFIG_PROVE_LOCKING seems to make issue go away but it affects
so much code it's hard to tell why it actually matters.

Same for disabling CONFIG_SMP. I noticed Broadcom's SDK keeps it
disabled. I tried it and it improves stability (I had 3 devices with 6
days of uptime and counting) indeed. Again it affects a lot of kernel
parts so it's hard to tell why it helps.

Unless someone comes up with some magic solution I'll probably try
building BCM53573 images without CONFIG_SMP for my personal needs.

All the locking operations rely on the fact that the instruction to acquire
or release a lock is atomic. Is it possible that it may not be the case
under certain circumstances for this ARM BCM53573 SoC? Or maybe some Kconfig
options are not set correctly like missing some errata that are needed.

I don't know enough about the 32-bit arm architecture to say whether this is
the case or not, but that is my best guess.

So, BCM53573 is Cortex-A7, which is ARMv7, which has the exclusive
load/store instructions. Whether the SoC has the necessary exclusive
monitors to support these instructions is another matter, and I
suspect someone with documentation would need to check that.

Finding documentation about this SoC has been very difficult unfortunately...

Would any of the lock or mutex debugging self test catch hardware designed without proper support for exclusive monitors in the DRAM controller? Keep in mind this is an uni-processor system however, does that mean we may have issues in our SMP_ON_UP alternative patching?
--
Florian