Re: [RFC 2/2] kvm: retry nx_huge_page_recovery_thread creation

From: Sean Christopherson
Date: Wed Feb 26 2025 - 15:14:05 EST


On Tue, Feb 25, 2025, Keith Busch wrote:
> From: Keith Busch <kbusch@xxxxxxxxxx>
>
> A VMM may send a signal to its threads while they've entered KVM_RUN. If
> that thread happens to be trying to make the huge page recovery vhost
> task, then it fails with -ERESTARTNOINTR. We need to retry if that
> happens, so we can't use call_once anymore. Replace it with a simple
> mutex and return the appropriate error instead of defaulting to ENOMEM.
>
> One downside is that everyone will retry if it fails, which can cause
> some additional pressure on low memory situations. Another downside is
> we're taking a mutex on every KVM run, even if we were previously
> successful in starting the vhost task, but that's really not such a
> common operation that needs to be optimized to avoid this lock.

Yes, it is. NAK to taking a VM-wide mutex on KVM_RUN.

I much prefer my (misguided in the original context[*]) approach of marking the
call_once() COMPLETED if and only if it succeeds.

diff --git a/include/linux/call_once.h b/include/linux/call_once.h
index 6261aa0b3fb0..9d47ed50139b 100644
--- a/include/linux/call_once.h
+++ b/include/linux/call_once.h
@@ -26,20 +26,28 @@ do { \
__once_init((once), #once, &__key); \
} while (0)

-static inline void call_once(struct once *once, void (*cb)(struct once *))
+static inline int call_once(struct once *once, int (*cb)(struct once *))
{
+ int r;
+
/* Pairs with atomic_set_release() below. */
if (atomic_read_acquire(&once->state) == ONCE_COMPLETED)
- return;
+ return 0;

guard(mutex)(&once->lock);
WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
if (atomic_read(&once->state) != ONCE_NOT_STARTED)
- return;
+ return -EINVAL;

atomic_set(&once->state, ONCE_RUNNING);
- cb(once);
+ r = cb(once);
+ if (r) {
+ atomic_set(&once->state, ONCE_NOT_STARTED);
+ return r;
+ }
+
atomic_set_release(&once->state, ONCE_COMPLETED);
+ return 0;
}

#endif /* _LINUX_CALL_ONCE_H */

[*] https://lore.kernel.org/all/Z5e4w7IlEEk2cpH-@xxxxxxxxxx