Re: [PATCH] KVM: x86: switch hugepage recovery thread to vhost_task

From: Paolo Bonzini
Date: Tue Jan 14 2025 - 13:14:43 EST


On 1/13/25 16:35, Keith Busch wrote:
Ok, I found the code and it doesn't exec (e.g.
https://github.com/google/crosvm/blob/b339d3d7/src/crosvm/sys/linux/jail_warden.rs#L122),
so that's not an option. Well, if I understand correctly from a
cursory look at the code, crosvm is creating a jailed child process
early, and then spawns further jails through it; so it's just this
first process that has to cheat.

One possibility on the KVM side is to delay creating the vhost_task
until the first KVM_RUN. I don't like it but...

This option is actually kind of appealing in that we don't need to
change any application side to filter out kernel tasks, as well as not
having a new kernel dependency to even report these types of tasks as
kernel threads.

I gave it a quick try. I'm not very familiar with the code here, so not
sure if this is thread safe or not, but it did successfully get crosvm
booting again.

That looks good to me too. Would you like to send it with a commit message and SoB?

Thanks,

Paolo

---
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2401606db2604..422b6b06de4fe 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7415,6 +7415,8 @@ int kvm_mmu_post_init_vm(struct kvm *kvm)
{
if (nx_hugepage_mitigation_hard_disabled)
return 0;
+ if (kvm->arch.nx_huge_page_recovery_thread)
+ return 0;
kvm->arch.nx_huge_page_last = get_jiffies_64();
kvm->arch.nx_huge_page_recovery_thread = vhost_task_create(
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c79a8cc57ba42..263363c46626b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11463,6 +11463,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
struct kvm_run *kvm_run = vcpu->run;
int r;
+ r = kvm_mmu_post_init_vm(vcpu->kvm);
+ if (r)
+ return r;
+
vcpu_load(vcpu);
kvm_sigset_activate(vcpu);
kvm_run->flags = 0;
@@ -12740,11 +12744,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return ret;
}
-int kvm_arch_post_init_vm(struct kvm *kvm)
-{
- return kvm_mmu_post_init_vm(kvm);
-}
-
static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
{
vcpu_load(vcpu);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 401439bb21e3e..a219bd2d8aec8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1596,7 +1596,6 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_preempted_in_kernel(struct kvm_vcpu *vcpu);
-int kvm_arch_post_init_vm(struct kvm *kvm);
void kvm_arch_pre_destroy_vm(struct kvm *kvm);
void kvm_arch_create_vm_debugfs(struct kvm *kvm);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index de2c11dae2316..adacc6eaa7d9d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1065,15 +1065,6 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname)
return ret;
}
-/*
- * Called after the VM is otherwise initialized, but just before adding it to
- * the vm_list.
- */
-int __weak kvm_arch_post_init_vm(struct kvm *kvm)
-{
- return 0;
-}
-
/*
* Called just after removing the VM from the vm_list, but before doing any
* other destruction.
@@ -1194,10 +1185,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (r)
goto out_err_no_debugfs;
- r = kvm_arch_post_init_vm(kvm);
- if (r)
- goto out_err;
-
mutex_lock(&kvm_lock);
list_add(&kvm->vm_list, &vm_list);
mutex_unlock(&kvm_lock);
@@ -1207,8 +1194,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
return kvm;
-out_err:
- kvm_destroy_vm_debugfs(kvm);
out_err_no_debugfs:
kvm_coalesced_mmio_free(kvm);
out_no_coalesced_mmio:
--