Re: [PATCH] KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test

From: Peter Xu
Date: Tue Feb 06 2024 - 03:55:10 EST


On Fri, Feb 02, 2024 at 03:18:31PM -0800, Sean Christopherson wrote:
> When finishing the final iteration of dirty_log_test testcase, set
> host_quit _before_ the final "continue" so that the vCPU worker doesn't
> run an extra iteration, and delete the hack-a-fix of an extra "continue"
> from the dirty ring testcase. This fixes a bug where the extra post to
> sem_vcpu_cont may not be consumed, which results in failures in subsequent
> runs of the testcases. The bug likely was missed during development as
> x86 supports only a single "guest mode", i.e. there aren't any subsequent
> testcases after the dirty ring test, because for_each_guest_mode() only
> runs a single iteration.
>
> For the regular dirty log testcases, letting the vCPU run one extra
> iteration is a non-issue as the vCPU worker waits on sem_vcpu_cont if and
> only if the worker is explicitly told to stop (vcpu_sync_stop_requested).
> But for the dirty ring test, which needs to periodically stop the vCPU to
> reap the dirty ring, letting the vCPU resume the guest _after_ the last
> iteration means the vCPU will get stuck without an extra "continue".
>
> However, blindly firing off an post to sem_vcpu_cont isn't guaranteed to
> be consumed, e.g. if the vCPU worker sees host_quit==true before resuming
> the guest. This results in a dangling sem_vcpu_cont, which leads to
> subsequent iterations getting out of sync, as the vCPU worker will
> continue on before the main task is ready for it to resume the guest,
> leading to a variety of asserts, e.g.
>
> ==== Test Assertion Failure ====
> dirty_log_test.c:384: dirty_ring_vcpu_ring_full
> pid=14854 tid=14854 errno=22 - Invalid argument
> 1 0x00000000004033eb: dirty_ring_collect_dirty_pages at dirty_log_test.c:384
> 2 0x0000000000402d27: log_mode_collect_dirty_pages at dirty_log_test.c:505
> 3 (inlined by) run_test at dirty_log_test.c:802
> 4 0x0000000000403dc7: for_each_guest_mode at guest_modes.c:100
> 5 0x0000000000401dff: main at dirty_log_test.c:941 (discriminator 3)
> 6 0x0000ffff9be173c7: ?? ??:0
> 7 0x0000ffff9be1749f: ?? ??:0
> 8 0x000000000040206f: _start at ??:?
> Didn't continue vcpu even without ring full
>
> Alternatively, the test could simply reset the semaphores before each
> testcase, but papering over hacks with more hacks usually ends in tears.

IMHO this is fine; we don't have a hard requirement anyway on sem in this
test, and we're pretty sure where the extra sem count came from (rather
than an unknown cause).

However since the patch can drop before_vcpu_join() by a proper ordering of
setup host_quit v.s. sem_post(), I think it's at least equally nice indeed.

>
> Reported-by: Shaoqin Huang <shahuang@xxxxxxxxxx>
> Fixes: 84292e565951 ("KVM: selftests: Add dirty ring buffer test")
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>

Reviewed-by: Peter Xu <peterx@xxxxxxxxxx>

Thanks,

--
Peter Xu