Re: [PATCH v8] kernel: add kcov code coverage

From: Dmitry Vyukov
Date: Wed Feb 24 2016 - 06:26:53 EST


On Thu, Feb 4, 2016 at 11:30 PM, Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, 4 Feb 2016 16:40:41 +0100 Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>
>> kcov provides code coverage collection for coverage-guided fuzzing
>> (randomized testing). Coverage-guided fuzzing is a testing technique
>> that uses coverage feedback to determine new interesting inputs to a
>> system. A notable user-space example is AFL
>> (http://lcamtuf.coredump.cx/afl/). However, this technique is not
>> widely used for kernel testing due to missing compiler and kernel
>> support.
>>
>> kcov does not aim to collect as much coverage as possible. It aims
>> to collect more or less stable coverage that is function of syscall
>> inputs. To achieve this goal it does not collect coverage in
>> soft/hard interrupts and instrumentation of some inherently
>> non-deterministic or non-interesting parts of kernel is disbled
>> (e.g. scheduler, locking).
>>
>> Currently there is a single coverage collection mode (tracing),
>> but the API anticipates additional collection modes.
>> Initially I also implemented a second mode which exposes
>> coverage in a fixed-size hash table of counters (what Quentin
>> used in his original patch). I've dropped the second mode for
>> simplicity.
>>
>> This patch adds the necessary support on kernel side.
>> The complimentary compiler support was added in gcc revision 231296.
>>
>> We've used this support to build syzkaller system call fuzzer,
>> which has found 90 kernel bugs in just 2 months:
>> https://github.com/google/syzkaller/wiki/Found-Bugs
>> We've also found 30+ bugs in our internal systems with syzkaller.
>> Another (yet unexplored) direction where kcov coverage would greatly
>> help is more traditional "blob mutation". For example, mounting
>> a random blob as a filesystem, or receiving a random blob over wire.
>>
>> Why not gcov. Typical fuzzing loop looks as follows: (1) reset
>> coverage, (2) execute a bit of code, (3) collect coverage, repeat.
>> A typical coverage can be just a dozen of basic blocks (e.g. an
>> invalid input). In such context gcov becomes prohibitively expensive
>> as reset/collect coverage steps depend on total number of basic
>> blocks/edges in program (in case of kernel it is about 2M). Cost of
>> kcov depends only on number of executed basic blocks/edges. On top of
>> that, kernel requires per-thread coverage because there are
>> always background threads and unrelated processes that also produce
>> coverage. With inlined gcov instrumentation per-thread coverage is not
>> possible.
>>
>> kcov exposes kernel PCs and control flow to user-space which
>> is insecure. But debugfs should not be mapped as user accessible.
>
> So the proposed user interface is ioctls against /sys/kernel/debug/kcov.
>
> I guess that's OK. We could just add sys_kcov() - syscalls are cheap
> enough. But we store state within the fd, don't we? So sys_kcov()
> would take an fd argument.
>
>> Based on a patch by Quentin Casasnovas.
>> Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
>> Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>
>> ---
>> Anticipating reasonable questions regarding usage of this feature.
>> Quentin Casasnovas and Vegard Nossum also plan to use kcov for
>> coverage-guided fuzzing. Currently they use a custom kernel patch
>> for their fuzzer and found several dozens of bugs.
>> There is also interest from Intel 0-DAY kernel test infrastructure.
>>
>> ...
>>
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1811,6 +1811,16 @@ struct task_struct {
>> /* bitmask and counter of trace recursion */
>> unsigned long trace_recursion;
>> #endif /* CONFIG_TRACING */
>> +#ifdef CONFIG_KCOV
>> + /* Coverage collection mode enabled for this task (0 if disabled). */
>> + int kcov_mode;
>
> Should really be enum kcov_mode. We could move it into <linux/kcov.h>:
>
> --- a/include/linux/kcov.h~kernel-add-kcov-code-coverage-fix
> +++ a/include/linux/kcov.h
> @@ -10,6 +10,14 @@ struct task_struct;
> void kcov_task_init(struct task_struct *t);
> void kcov_task_exit(struct task_struct *t);
>
> +enum kcov_mode {
> + /*
> + * Tracing coverage collection mode.
> + * Covered PCs are collected in a per-task buffer.
> + */
> + KCOV_MODE_TRACE = 1,
> +};
> +
> #else
>
> static inline void kcov_task_init(struct task_struct *t) {}
> --- a/include/linux/sched.h~kernel-add-kcov-code-coverage-fix
> +++ a/include/linux/sched.h
> @@ -51,6 +51,7 @@ struct sched_param {
> #include <linux/resource.h>
> #include <linux/timer.h>
> #include <linux/hrtimer.h>
> +#include <linux/kcov.h>
> #include <linux/task_io_accounting.h>
> #include <linux/latencytop.h>
> #include <linux/cred.h>
> @@ -1814,7 +1815,7 @@ struct task_struct {
> #endif /* CONFIG_TRACING */
> #ifdef CONFIG_KCOV
> /* Coverage collection mode enabled for this task (0 if disabled). */
> - int kcov_mode;
> + enum kcov_mode kcov_mode;
> /* Size of the kcov_area. */
> unsigned kcov_size;
> /* Buffer for coverage collection. */
> --- a/kernel/kcov.c~kernel-add-kcov-code-coverage-fix
> +++ a/kernel/kcov.c
> @@ -13,14 +13,6 @@
> #include <linux/uaccess.h>
> #include <linux/kcov.h>
>
> -enum kcov_mode {
> - /*
> - * Tracing coverage collection mode.
> - * Covered PCs are collected in a per-task buffer.
> - */
> - KCOV_MODE_TRACE = 1,
> -};
> -
> /*
> * kcov descriptor (one per opened debugfs file).
> * State transitions of the descriptor:
>
>>
>> ...
>>
>> +/*
>> + * kcov descriptor (one per opened debugfs file).
>> + * State transitions of the descriptor:
>> + * - initial state after open()
>> + * - then there must be a single ioctl(KCOV_INIT_TRACE) call
>> + * - then, mmap() call (several calls are allowed but not useful)
>> + * - then, repeated enable/disable for a task (only one task a time allowed)
>> + */
>> +struct kcov {
>> + /*
>> + * Reference counter. We keep one for:
>> + * - opened file descriptor
>> + * - task with enabled coverage (we can't unwire it from another task)
>> + */
>> + atomic_t rc;
>
> "rc" usually means "return code". Calling this "refcount" would cause
> less surprise.
>
>> + /* The lock protects mode, size, area and t. */
>> + spinlock_t lock;
>> + enum kcov_mode mode;
>> + unsigned size;
>> + void *area;
>> + struct task_struct *t;
>
> Can we get some documentation in place for the above? I find that
> documenting the data structures is more valuable than documenting the
> code.
>
>> +};
>> +
>> +/*
>> + * Entry point from instrumented code.
>> + * This is called once per basic-block/edge.
>> + */
>> +void __sanitizer_cov_trace_pc(void)
>> +{
>> + struct task_struct *t;
>> + enum kcov_mode mode;
>> +
>> + t = current;
>> + /*
>> + * We are interested in code coverage as a function of a syscall inputs,
>> + * so we ignore code executed in interrupts.
>> + */
>> + if (!t || in_interrupt())
>> + return;
>> + mode = READ_ONCE(t->kcov_mode);
>> + if (mode == KCOV_MODE_TRACE) {
>> + unsigned long *area;
>> + unsigned long pos;
>> +
>> + /*
>> + * There is some code that runs in interrupts but for which
>> + * in_interrupt() returns false (e.g. preempt_schedule_irq()).
>> + * READ_ONCE()/barrier() effectively provides load-acquire wrt
>> + * interrupts, there are paired barrier()/WRITE_ONCE() in
>> + * kcov_ioctl_locked().
>> + */
>> + barrier();
>> + area = t->kcov_area;
>> + /* The first word is number of subsequent PCs. */
>> + pos = READ_ONCE(area[0]) + 1;
>> + if (likely(pos < t->kcov_size)) {
>> + area[pos] = _RET_IP_;
>> + WRITE_ONCE(area[0], pos);
>> + }
>> + }
>> +}
>> +EXPORT_SYMBOL(__sanitizer_cov_trace_pc);
>
> Why is this exported to modules? gcc emits the call?

Yes, gcc emits the call.

>> ...
>>
>> +static int kcov_mmap(struct file *filep, struct vm_area_struct *vma)
>> +{
>> + int res = 0;
>> + void *area;
>> + struct kcov *kcov = vma->vm_file->private_data;
>> + unsigned long size, off;
>> + struct page *page;
>> +
>> + area = vmalloc_user(vma->vm_end - vma->vm_start);
>> + if (!area)
>> + return -ENOMEM;
>> +
>> + spin_lock(&kcov->lock);
>> + size = kcov->size * sizeof(unsigned long);
>> + if (kcov->mode == 0 || vma->vm_pgoff != 0 ||
>
> kcov->mode has type `enum kcov_mode'. We should hard-wire a zero in
> here: neater to create KCOV_MODE_DISABLED or whatever? In multiple
> places.
>
>> + vma->vm_end - vma->vm_start != size) {
>> + res = -EINVAL;
>> + goto exit;
>> + }
>> + if (!kcov->area) {
>> + kcov->area = area;
>> + vma->vm_flags |= VM_DONTEXPAND;
>> + spin_unlock(&kcov->lock);
>> + for (off = 0; off < size; off += PAGE_SIZE) {
>> + page = vmalloc_to_page(kcov->area + off);
>> + if (vm_insert_page(vma, vma->vm_start + off, page))
>> + WARN_ONCE(1, "vm_insert_page() failed");
>> + }
>> + return 0;
>> + }
>> +exit:
>> + spin_unlock(&kcov->lock);
>> + vfree(area);
>> + return res;
>> +}
>> +
>>
>> ...
>>
>> +static int kcov_ioctl_locked(struct kcov *kcov, unsigned int cmd,
>> + unsigned long arg)
>> +{
>> + struct task_struct *t;
>> +
>> + switch (cmd) {
>> + case KCOV_INIT_TRACE:
>> + /*
>> + * Enable kcov in trace mode and setup buffer size.
>> + * Must happen before anything else.
>> + * Size must be at least 2 to hold current position and one PC.
>> + */
>> + if (arg < 2 || arg > INT_MAX)
>> + return -EINVAL;
>> + if (kcov->mode != 0)
>> + return -EBUSY;
>> + kcov->mode = KCOV_MODE_TRACE;
>> + kcov->size = arg;
>
> kcov->size comes in from userspace and later get multiplied by
> sizeof(unsigned long). This means that userspace can trigger math
> overflows (on 32-bit if that happens) and havoc might ensue. So can we
> please get some better checks in place for `size'?
>
> Also, move these checks (of `arg') so they immediately precede this
> *usage* of `arg' so the reader can better see what's happening.
>
>> + return 0;
>> + case KCOV_ENABLE:
>> + /*
>> + * Enable coverage for the current task.
>> + * At this point user must have been enabled trace mode,
>> + * and mmapped the file. Coverage collection is disabled only
>> + * at task exit or voluntary by KCOV_DISABLE. After that it can
>> + * be enabled for another task.
>> + */
>> + if (arg != 0 || kcov->mode == 0 || kcov->area == NULL)
>> + return -EINVAL;
>
> This reader doesn't know what is in "arg". Documentation, please. Or
> copy `arg' into a suitably named local.
>
>> + if (kcov->t != NULL)
>> + return -EBUSY;
>> + t = current;
>> + /* Cache in task struct for performance. */
>> + t->kcov_size = kcov->size;
>> + t->kcov_area = kcov->area;
>> + /* See comment in __sanitizer_cov_trace_pc(). */
>> + barrier();
>> + WRITE_ONCE(t->kcov_mode, kcov->mode);
>> + t->kcov = kcov;
>> + kcov->t = t;
>> + /* This is put either in kcov_task_exit() or in KCOV_DISABLE. */
>> + kcov_get(kcov);
>
> We didn't get a ref on the task_struct. I guess that's OK because
> task_struct.kcov gets torn down in kcov_task_exit().

Yes. We can't have coverage collection enabled for a dead task.

> But what happens if userspace simply closes the fd without running
> KCOV_DISABLE? ?

That's why we take the reference here. Struct kcov descriptor lives
while (1) the fd is opened _or_ (2) coverage collection is enabled for
a task. It would be weird for a program to close the fd (and unmap the
mapping) while coverage collection is still enabled, but it should not
affect collection. Struct kcov will be freed when the task exits in
this case.



>> + return 0;
>> + case KCOV_DISABLE:
>> + /* Disable coverage for the current task. */
>> + if (arg != 0 || current->kcov != kcov)
>> + return -EINVAL;
>
> This reader doesn't know what `arg' means.
>
>> + t = current;
>> + if (WARN_ON(kcov->t != t))
>> + return -EINVAL;
>> + kcov_task_init(t);
>> + kcov->t = NULL;
>> + kcov_put(kcov);
>> + return 0;
>> + default:
>> + return -EINVAL;
>
> For reasons I don't recall, we conventionally return -ENOTTY for
> unimplemented ioctl modes. Alan Cox will remember ;)
>
> http://www.makelinux.net/ldd3/chp-6-sect-1 says
>
> : This error code is interpreted by the C library as "inappropriate ioctl
> : for device," which is usually exactly what the programmer needs to
> : hear. It's still pretty common, though, to return -EINVAL in response
> : to an invalid ioctl command.
>
>> + }
>> +}
>> +
>> +static long kcov_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>> +{
>> + struct kcov *kcov;
>> + int res;
>> +
>> + kcov = filep->private_data;
>> + spin_lock(&kcov->lock);
>> + res = kcov_ioctl_locked(kcov, cmd, arg);
>> + spin_unlock(&kcov->lock);
>> + return res;
>> +}
>
> Wait. `unsigned long arg' is going to be a 32-bit quantity when a
> 32-bit executable is running on a 64-bit kernel. Doesn't this ioctl
> need compat handling?

As far as I see in compat_ioctl.c, in my case arg will be just
zero-extended. Since we use the arg only to pass buffer size,
zero-extension is fine.

Mailed "kcov: clean up code" patch to address the rest of comments.