Re: [PATCH] vmci_host: fix a race condition in vmci_host_poll() causing GPF

From: Dae R. Jeong
Date: Thu Mar 30 2023 - 07:19:43 EST


On Wed, Mar 29, 2023 at 12:15:21PM +0200, Greg KH wrote:
> On Mon, Mar 27, 2023 at 09:01:53PM +0900, Dae R. Jeong wrote:
> > During fuzzing, a general protection fault is observed in
> > vmci_host_poll().
> >
> > general protection fault, probably for non-canonical address 0xdffffc0000000019: 0000 [#1] PREEMPT SMP KASAN
> > KASAN: null-ptr-deref in range [0x00000000000000c8-0x00000000000000cf]
> > RIP: 0010:__lock_acquire+0xf3/0x5e00 kernel/locking/lockdep.c:4926
> > <- omitting registers ->
> > Call Trace:
> > <TASK>
> > lock_acquire+0x1a4/0x4a0 kernel/locking/lockdep.c:5672
> > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> > _raw_spin_lock_irqsave+0xb3/0x100 kernel/locking/spinlock.c:162
> > add_wait_queue+0x3d/0x260 kernel/sched/wait.c:22
> > poll_wait include/linux/poll.h:49 [inline]
> > vmci_host_poll+0xf8/0x2b0 drivers/misc/vmw_vmci/vmci_host.c:174
> > vfs_poll include/linux/poll.h:88 [inline]
> > do_pollfd fs/select.c:873 [inline]
> > do_poll fs/select.c:921 [inline]
> > do_sys_poll+0xc7c/0x1aa0 fs/select.c:1015
> > __do_sys_ppoll fs/select.c:1121 [inline]
> > __se_sys_ppoll+0x2cc/0x330 fs/select.c:1101
> > do_syscall_x64 arch/x86/entry/common.c:51 [inline]
> > do_syscall_64+0x4e/0xa0 arch/x86/entry/common.c:82
> > entry_SYSCALL_64_after_hwframe+0x46/0xb0
> >
> > Example thread interleaving that causes the general protection fault
> > is as follows:
> >
> > CPU1 (vmci_host_poll) CPU2 (vmci_host_do_init_context)
> > ----- -----
> > // Read uninitialized context
> > context = vmci_host_dev->context;
> > // Initialize context
> > vmci_host_dev->context = vmci_ctx_create();
> > vmci_host_dev->ct_type = VMCIOBJ_CONTEXT;
> >
> > if (vmci_host_dev->ct_type == VMCIOBJ_CONTEXT) {
> > // Dereferencing the wrong pointer
> > poll_wait(..., &context->host_context);
> > }
> >
> > In this scenario, vmci_host_poll() reads vmci_host_dev->context first,
> > and then reads vmci_host_dev->ct_type to check that
> > vmci_host_dev->context is initialized. However, since these two reads
> > are not atomically executed, there is a chance of a race condition as
> > described above.
> >
> > To fix this race condition, read vmci_host_dev->context after checking
> > the value of vmci_host_dev->ct_type so that vmci_host_poll() always
> > reads an initialized context.
> >
> > Reported-by: Dae R. Jeong <threeearcat@xxxxxxxxx>
> > Fixes: 8bf503991f87 ("VMCI: host side driver implementation.")
> > Signed-off-by: Dae R. Jeong <threeearcat@xxxxxxxxx>
>
> If you author and sign-off on the patch, no need for a Reported-by: as
> that is obvious :)

For a student like me, two of my names are cooler than one... (shy smile :)...)

> And how did you test this change?

I attach a simple C program used in testing at the end of this email.
I used the same C program to trigger the crash before applying the
patch, and to test the patch after applying it.

Before applying this patch, I inserted msleep() to trigger the crash
easily as follows:

diff --git a/drivers/misc/vmw_vmci/vmci_host.c b/drivers/misc/vmw_vmci/vmci_host.c
index 857b9851402a..e925f7effac4 100644
--- a/drivers/misc/vmw_vmci/vmci_host.c
+++ b/drivers/misc/vmw_vmci/vmci_host.c
@@ -24,6 +24,7 @@
#include <linux/smp.h>
#include <linux/fs.h>
#include <linux/io.h>
+#include <linux/delay.h>

#include "vmci_handle_array.h"
#include "vmci_queue_pair.h"
@@ -168,6 +169,11 @@ static __poll_t vmci_host_poll(struct file *filp, poll_table *wait)
struct vmci_ctx *context = vmci_host_dev->context;
__poll_t mask = 0;

+ msleep(1000);
+
+ printk("%s, context=%px, ct_type=%d\n", __func__, context,
+ vmci_host_dev->ct_type);
+
if (vmci_host_dev->ct_type == VMCIOBJ_CONTEXT) {
/* Check for VMCI calls to this VM context. */
if (wait)
@@ -341,6 +347,10 @@ static int vmci_host_do_init_context(struct vmci_host_dev *vmci_host_dev,
}

vmci_host_dev->ct_type = VMCIOBJ_CONTEXT;
+
+ printk("%s, context=%px, ct_type=%d\n", __func__,
+ vmci_host_dev->context, vmci_host_dev->ct_type);
+
atomic_inc(&vmci_host_active_users);

vmci_call_vsock_callback(true);

With this msleep(), the C program below can easily trigger the GPF.

Then I applied the patch that I propose, and ran the same C
program. After applying the patch, I couldn't observe the GPF.

> thanks,
>
> greg k-h

Best regards,
Dae R. Jeong

------------->8-----------------
#include <fcntl.h>
#include <poll.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

struct init_block {
int cid;
int flags;
};

int fd;

void *th1(void *a) {
struct init_block s = {
.cid = 2,
.flags = 0,
};
ioctl(fd, 0x7a0, &s);
return NULL;
}

void *th2(void *a) {
struct pollfd pfd = {.fd = fd};
poll(&pfd, 1, 20000);
return NULL;
}

int test() {
#define filename "/dev/vmci"

fd = openat(AT_FDCWD, filename, 0x2, 0x0);
int var = 0x10000;
ioctl(fd, 0x7a7, &var);
pthread_t pth1, pth2;
pthread_create(&pth1, NULL, th1, NULL);
pthread_create(&pth2, NULL, th2, NULL);
pthread_join(pth1, NULL);
pthread_join(pth2, NULL);
return 0;
}

int main() {
for (;;) {
pid_t pid = fork();
if (pid == 0) {
test();
} else {
usleep(100 * 1000);
kill(pid, SIGKILL);
}
}
}