On Wed, Jan 26, 2022 at 07:08:22AM +0100, Juergen Gross wrote:
On 25.01.22 19:27, Corey Minyard wrote:
We have a customer that had been seeing CSD lock issues on a Centos 7
kernel (unfortunately). I couldn't find anything or any kernel changes
that might fix it, so I was consdering it was the CSD locking issue you
have been chasing for a while.
Is this on bare metal or in a virtualized environment?
This is bare metal.
I do think I know what happened. Here's my analysis...
csd: Detected non-responsive CSD lock (#1) on CPU#3, waiting 5000000042 ns for CPU#55 flush_tlb_func+0x0/0xb0(0xffff8e0b3e2afbe8).
csd: CSD lock (#1) unresponsive.
csd: cnt(0000000): 0000->0000 queue
csd: cnt(0000001): ffff->0037 idle
The above means that these events weren't seen, I think. We can
ignore them in any case.
csd: cnt(63d8dd8): 0003->0037 ipi
csd: cnt(63d8dd9): 0003->0037 ping
csd: cnt(63d8dda): 0003->ffff pinged
This is a little confusing. The first two lines have to be from
__smp_call_single_queue_debug. The last line has to be from
smp_call_function_many. But you never see the pinged from
__smp_call_single_queue_debug.
csd: cnt(63d8dea): 0035->0037 pinged
The tail of CPU 53 sending an IPI to CPU 55 in
__smp_call_single_queue_debug.
csd: cnt(63d8deb): ffff->0037 gotipi
csd: cnt(63d8dec): ffff->0037 handle
csd: cnt(63d8ded): ffff->0037 dequeue (src CPU 0 == empty)
csd: cnt(63d8dee): ffff->0037 hdlend (src CPU 0 == early)
CPU 55 is handling the IPI(s) it was sent earlier.
csd: cnt(63d8e1f): 0003->0037 queue
csd: cnt(63d8e20): 0003->0037 ipi
csd: cnt(63d8e21): 0003->0037 ping
In __smp_call_single_queue_debug CPU 3 sends another message to
CPU 55 and sends an IPI. But there should be a pinged entry
after this.
csd: cnt(63d8e22): 0003->0037 queue
csd: cnt(63d8e23): 0003->0037 noipi
Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature