On 14/07/21 10:01, Daniel Bristot de Oliveira wrote:
Hey
I use kvm-vm for regular development, and while using the kernel-rt v5.13-rt1
(the latest) on the host, and a regular kernel on the guest, after a while,
this happens:
[ 1723.404979] ------------[ cut here ]------------
[ 1723.404981] WARNING: CPU: 12 PID: 2554 at fs/eventfd.c:74 eventfd_signal+0x7e/0x90
[ 1723.405055] RIP: 0010:eventfd_signal+0x7e/0x90
[ 1723.405059] Code: 01 00 00 00 be 03 00 00 00 4c 89 ef e8 5b ec d9 ff 65 ff 0d e4 34 c9 5a 4c 89 ef e8 ec a8 86 00 4c 89 e0 5b 5d 41 5c 41 5d c3 <0f> 0b 45 31 e4 5b 5d 4c 89 e0 41 5c 41 5d c3 0f 1f 00 0f 1f 44 00
[ 1723.405078] vhost_tx_batch.constprop.0+0x7d/0xc0 [vhost_net]
[ 1723.405083] handle_tx_copy+0x15b/0x5c0 [vhost_net]
[ 1723.405088] ? __vhost_add_used_n+0x200/0x200 [vhost]
[ 1723.405092] handle_tx+0xa5/0xe0 [vhost_net]
[ 1723.405095] vhost_worker+0x93/0xd0 [vhost]
[ 1723.405099] kthread+0x186/0x1a0
[ 1723.405103] ? __kthread_parkme+0xa0/0xa0
[ 1723.405105] ret_from_fork+0x22/0x30
[ 1723.405110] ---[ end trace 0000000000000002 ]---
The WARN has this comment above:
/*
* Deadlock or stack overflow issues can happen if we recurse here
* through waitqueue wakeup handlers. If the caller users potentially
* nested waitqueues with custom wakeup handlers, then it should
* check eventfd_signal_count() before calling this function. If
* it returns true, the eventfd_signal() call should be deferred to a
* safe context.
*/
This was added in 2020, so it's unlikely to be the direct cause of the
change. What is a known-good version for the host?
Since it is not KVM stuff, I'm CCing Michael and Jason.
Paolo