pthread_sigmask call is hanging in the kernel

From: Mathieu Tarral
Date: Wed Aug 01 2018 - 08:49:48 EST


Greetings,

while developing a patch to perform introspection on KVM[1], i'm facing a bug
where a call to pthread_sigmask(), coming from the libvirt client library,
hangs for no obvious reason[2].

I come to ask for help and guidance on this mailing list, as i don't know how
to proceed further. In fact:
- the bug can only be triggered on a bare metal server in public cloud, i was
unable to reproduce it anywhere else, including a VM.
- given that the server is in public cloud, debugging with KGDB is impossible
since it requires 2 computers with a serial cable (as i understood)

After a few discussions on the Libvirt mailing list, they suggested the bug was
due to a memory corruption. it's a possibility, but very unlickely as I would
have seen unexpected crashes everywhere else.

The kernel i'm using is a modified Debian Stretch kernel.

The next step for me is to understand where the call is blocked in the kernel,
and I would like to ask you what are my solutions to get a kernel backtrace of
my process in this situation ?

[1]: https://github.com/KVM-VMI/kvm-vmi
[2]: https://gist.github.com/Wenzel/6dc9e32558a7ffd51d1c0d89177532fc

Thank you for your time,
Best regards
--
Mathieu Tarral