[PATCH v2 0/3] kernel/smp.c: add more CSD lock debugging

From: Juergen Gross
Date: Mon Mar 01 2021 - 05:15:06 EST


This patch series was created to help catching a rather long standing
problem with smp_call_function_any() and friends.

Very rarely a remote cpu seems not to execute a queued function and
the cpu queueing that function request will wait forever for the
CSD lock to be released by the remote cpu.

This problem has been observed primarily when running as a guest on
top of KVM or Xen, but there are reports of the same pattern for the
bare metal case, too. It seems to exist since about 2 years now, and
there is not much data available.

What is known up to now is that resending an IPI to the remote cpu is
helping.

The patches are adding more debug data being printed in a hang
situation using a kernel with CONFIG_CSD_LOCK_WAIT_DEBUG configured.
Additionally the debug coding can be controlled via a new parameter
in order to make it easier to use such a kernel in a production
environment without too much negative performance impact. Per default
the debugging additions will be switched off and they can be activated
via the new boot parameter:

csdlock_debug=1 will switch on the basic debugging and IPI resend
csdlock_debug=ext will add additional data printed out in a hang
situation, but this option will have a larger impact on performance.

I hope that the "ext" setting will help to find the root cause of the
problem.

Juergen Gross (3):
kernel/smp: add boot parameter for controlling CSD lock debugging
kernel/smp: prepare more CSD lock debugging
kernel/smp: add more data to CSD lock debugging

.../admin-guide/kernel-parameters.txt | 10 +
kernel/smp.c | 284 +++++++++++++++++-
2 files changed, 282 insertions(+), 12 deletions(-)

--
2.26.2