Re: [REPORT] Softlockups on PowerNV with upstream

From: Gavin Shan
Date: Thu Apr 10 2025 - 05:49:23 EST


On 4/10/25 6:23 PM, Oscar Salvador wrote:
On Thu, Apr 10, 2025 at 03:35:19PM +1000, Gavin Shan wrote:
Thanks, Oscar. You're correct that the overhead is introduced by for_each_present_section_nr().
I already had the fix, working on IBM's Power9 machine, where the issue can be
reproduced. Please see the attached patch.

I'm having most tests on ARM64 machine for the fix.

Looks good to me.
But we need a comment explaining why block_id is set to ULONG_MAX
at the beginning as this might not be obvious.

Also, do we need
if (block_id != ULONG_MAX && memory_block_id(nr) == block_id) ?

Cannot just be

if (memory_block_id(nr) == block_id) ?

AFAICS, the first time we loop through 'memory_block_id(nr) == ULONG_MAX'
will evaluate false and and we will set block_id afterwards.

Either way looks fine to me.
Another way I guess would be:


Yeah, we need to record the last handled block ID by @block_id. For the
first time to register the block memory device in the loop, @block_id needs
to be invalid (ULONG_MAX), bypassing the check of 'memory_block_id(nr) == block_id'.
I will post the fix for review after Aditya confirms it works for him, with extra
comment to explain why @block_id is initialized to ULONG_MAX.

Aditya, please have a try when you get a chance, thanks! I verified it on Power9
machine where the issue exists and on one of my ARM64 machine.

Thanks,
Gavin