Re: Possible reproduction of CSD locking issue

From: Juergen Gross
Date: Wed Jan 26 2022 - 10:34:20 EST


On 26.01.22 16:31, Corey Minyard wrote:
On Wed, Jan 26, 2022 at 03:51:36PM +0100, Juergen Gross wrote:
On 26.01.22 14:56, Corey Minyard wrote:
On Wed, Jan 26, 2022 at 07:08:22AM +0100, Juergen Gross wrote:

snip..


csd: cnt(63d8e1f): 0003->0037 queue
csd: cnt(63d8e20): 0003->0037 ipi
csd: cnt(63d8e21): 0003->0037 ping

In __smp_call_single_queue_debug CPU 3 sends another message to
CPU 55 and sends an IPI. But there should be a pinged entry
after this.

csd: cnt(63d8e22): 0003->0037 queue
csd: cnt(63d8e23): 0003->0037 noipi

This is interesting. Those are 5 consecutive entries without any
missing in between (see the counter values). Could it be that after
the ping there was an interrupt and the code was re-entered for
sending another IPI? This would clearly result in a hang as seen.

Since preempt is enabled, wouldn't it eventually come back to the first
thread and send the IPI? Unless CPU 3 is stuck in an interrupt or
interrupt storm.

With preempt disabled (you probably meant that) only an IPI from
interrupt context would be possible. And it would be stuck, of course,
as it would need to wait for the CSD lock.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature