Re: [RFC][PATCH 1/5] [PATCH 1/5] kvm: register in task_struct

From: Fengguang Wu
Date: Tue Sep 04 2018 - 04:31:27 EST


On Tue, Sep 04, 2018 at 09:43:50AM +0200, Christian Borntraeger wrote:


On 09/04/2018 09:15 AM, Fengguang Wu wrote:
On Tue, Sep 04, 2018 at 08:37:03AM +0200, Nikita Leshenko wrote:
On 4 Sep 2018, at 2:46, Fengguang Wu <fengguang.wu@xxxxxxxxx> wrote:

Here it goes:

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 99ce070e7dcb..27c5446f3deb 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -27,6 +27,7 @@ typedef int vm_fault_t;
struct address_space;
struct mem_cgroup;
struct hmm;
+struct kvm;
/*
* Each physical page in the system has a struct page associated with
@@ -489,10 +490,19 @@ struct mm_struct {
ÂÂÂÂ/* HMM needs to track a few things per mm */
ÂÂÂÂstruct hmm *hmm;
#endif
+#if IS_ENABLED(CONFIG_KVM)
+ÂÂÂ struct kvm *kvm;
+#endif
} __randomize_layout;
extern struct mm_struct init_mm;
+#if IS_ENABLED(CONFIG_KVM)
+static inline struct kvm *mm_kvm(struct mm_struct *mm) { return mm->kvm; }
+#else
+static inline struct kvm *mm_kvm(struct mm_struct *mm) { return NULL; }
+#endif
+
static inline void mm_init_cpumask(struct mm_struct *mm)
{
#ifdef CONFIG_CPUMASK_OFFSTACK
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0c483720de8d..dca6156a7b35 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3892,7 +3892,7 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
ÂÂÂÂif (type == KVM_EVENT_CREATE_VM) {
ÂÂÂÂÂÂÂ add_uevent_var(env, "EVENT=create");
ÂÂÂÂÂÂÂ kvm->userspace_pid = task_pid_nr(current);
-ÂÂÂÂÂÂÂ current->kvm = kvm;
+ÂÂÂÂÂÂÂ current->mm->kvm = kvm;
I think you also need to reset kvm to NULL once the VM is
destroyed, otherwise it would point to dangling memory.

Good point! Here is the incremental patch:

--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3894,6 +3894,7 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kvm->userspace_pid = task_pid_nr(current);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ current->mm->kvm = kvm;
ÂÂÂÂÂÂ } else if (type == KVM_EVENT_DESTROY_VM) {
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ current->mm->kvm = NULL;
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ add_uevent_var(env, "EVENT=destroy");
ÂÂÂÂÂÂ }
ÂÂÂÂÂÂ add_uevent_var(env, "PID=%d", kvm->userspace_pid);

I think you should put both code snippets somewhere else. This has probably nothing to do
with the uevent. Instead this should go into kvm_destroy_vm and kvm_create_vm. Make sure
to take care of the error handling.

OK. Will set the pointer late and reset it early like this. Since
there are several error conditions after kvm_create_vm(), it may be
more convenient to set it in kvm_dev_ioctl_create_vm(), when there are
no more errors to handle:

--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -724,6 +724,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
struct mm_struct *mm = kvm->mm;

kvm_uevent_notify_change(KVM_EVENT_DESTROY_VM, kvm);
+ current->mm->kvm = NULL;
kvm_destroy_vm_debugfs(kvm);
kvm_arch_sync_events(kvm);
spin_lock(&kvm_lock);
@@ -3206,6 +3207,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
fput(file);
return -ENOMEM;
}
+ current->mm->kvm = kvm;
kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm);

fd_install(r, file);

Can you point us to the original discussion about the why and what you are
trying to achieve?

It's the initial RFC post. [PATCH 0] describes some background info.

Basically we're implementing /proc/PID/idle_bitmap for user space to
walk page tables and get "accessed" bits. Since VM's "accessed" bits
will be reflected in EPT (or AMD NPT), we'll need to walk EPT when
detected it is QEMU main process.
Thanks,
Fengguang