Re: Kernel Concurrency Sanitizer (KCSAN)

From: Marco Elver
Date: Thu Oct 03 2019 - 15:27:23 EST


On Thu, 3 Oct 2019 at 18:12, Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> On Fri, Sep 20, 2019 at 07:51:04PM +0200, Marco Elver wrote:
> > On Fri, 20 Sep 2019 at 18:47, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> > >
> > > On Fri, Sep 20, 2019 at 6:31 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > > >
> > > > On Fri, Sep 20, 2019 at 04:18:57PM +0200, Marco Elver wrote:
> > > > > We would like to share a new data-race detector for the Linux kernel:
> > > > > Kernel Concurrency Sanitizer (KCSAN) --
> > > > > https://github.com/google/ktsan/wiki/KCSAN (Details:
> > > > > https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)
> > > >
> > > > Nice!
> > > >
> > > > BTW kcsan_atomic_next() is missing a stub definition in <linux/kcsan.h>
> > > > when !CONFIG_KCSAN:
> > > >
> > > > https://github.com/google/ktsan/commit/a22a093a0f0d0b582c82cdbac4f133a3f61d207c#diff-19d7c475b4b92aab8ba440415ab786ec
> > > >
> > > > ... and I think the kcsan_{begin,end}_atomic() stubs need to be static
> > > > inline too.
> >
> > Thanks for catching, fixed and pushed. Feel free to rebase your arm64 branch.
>
> Great; I've just done so!
>
> What's the plan for posting a PATCH or RFC series?

I'm planning to send some patches, but with the amount of data-races
being found I need to prioritize what we send first. Currently the
plan is to let syzbot find data-races, and we'll start by sending a
few critical reports that syzbot found. Syzbot should be set up fully
and start finding data-races within next few days.

> The rest of this email is rabbit-holing on the issue KCSAN spotted;
> sorry about that!

Thanks for looking into this! I think you're right, and please do feel
free to send a proper patch out.

Thanks,
-- Marco

> [...]
>
> > > > We have some interesting splats at boot time in stop_machine, which
> > > > don't seem to have been hit/fixed on x86 yet in the kcsan-with-fixes
> > > > branch, e.g.
> > > >
> > > > [ 0.237939] ==================================================================
> > > > [ 0.239431] BUG: KCSAN: data-race in multi_cpu_stop+0xa8/0x198 and set_state+0x80/0xb0
> > > > [ 0.241189]
> > > > [ 0.241606] write to 0xffff00001003bd00 of 4 bytes by task 24 on cpu 3:
> > > > [ 0.243435] set_state+0x80/0xb0
> > > > [ 0.244328] multi_cpu_stop+0x16c/0x198
> > > > [ 0.245406] cpu_stopper_thread+0x170/0x298
> > > > [ 0.246565] smpboot_thread_fn+0x40c/0x560
> > > > [ 0.247696] kthread+0x1a8/0x1b0
> > > > [ 0.248586] ret_from_fork+0x10/0x18
> > > > [ 0.249589]
> > > > [ 0.250006] read to 0xffff00001003bd00 of 4 bytes by task 14 on cpu 1:
> > > > [ 0.251804] multi_cpu_stop+0xa8/0x198
> > > > [ 0.252851] cpu_stopper_thread+0x170/0x298
> > > > [ 0.254008] smpboot_thread_fn+0x40c/0x560
> > > > [ 0.255135] kthread+0x1a8/0x1b0
> > > > [ 0.256027] ret_from_fork+0x10/0x18
> > > > [ 0.257036]
> > > > [ 0.257449] Reported by Kernel Concurrency Sanitizer on:
> > > > [ 0.258918] CPU: 1 PID: 14 Comm: migration/1 Not tainted 5.3.0-00007-g67ab35a199f4-dirty #3
> > > > [ 0.261241] Hardware name: linux,dummy-virt (DT)
> > > > [ 0.262517] ==================================================================>
> >
> > Thanks, the fixes in -with-fixes were ones I only encountered with
> > Syzkaller, where I disable KCSAN during boot. I've just added a fix
> > for this race and pushed to kcsan-with-fixes.
>
> I think that's:
>
> https://github.com/google/ktsan/commit/c1bc8ab013a66919d8347c2392f320feabb14f92
>
> ... but that doesn't look quite right to me, as it leaves us with the shape:
>
> do {
> if (READ_ONCE(msdata->state) != curstate) {
> curstate = msdata->state;
> switch (curstate) {
> ...
> }
> ack_state(msdata);
> }
> } while (curstate != MULTI_STOP_EXIT);
>
> I don't believe that we have a guarantee of read-after-read ordering
> between the READ_ONCE(msdata->state) and the subsequent plain access of
> msdata->state, as we've been caught out on that in the past, e.g.
>
> https://lore.kernel.org/lkml/1506527369-19535-1-git-send-email-will.deacon@xxxxxxx/
>
> ... which I think means we could switch on a stale value of
> msdata->state. That would mean we might handle the same state twice,
> calling ack_state() more times than expected and corrupting the count.
>
> The compiler could also replace uses of curstate with a reload of
> msdata->state. If it did so for the while condition, we could skip the
> expected ack_state() for MULTI_STOP_EXIT, though it looks like that
> might not matter.
>
> I think we need to make sure that we use a consistent snapshot,
> something like the below. Assuming I'm not barking up the wrong tree, I
> can spin this as a proper patch.
>
> Thanks,
> Mark.
>
> ---->8----
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index b4f83f7bdf86..67a0b454b5b5 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -167,7 +167,7 @@ static void set_state(struct multi_stop_data *msdata,
> /* Reset ack counter. */
> atomic_set(&msdata->thread_ack, msdata->num_threads);
> smp_wmb();
> - msdata->state = newstate;
> + WRITE_ONCE(msdata->state, newstate);
> }
>
> /* Last one to ack a state moves to the next state. */
> @@ -186,7 +186,7 @@ void __weak stop_machine_yield(const struct cpumask *cpumask)
> static int multi_cpu_stop(void *data)
> {
> struct multi_stop_data *msdata = data;
> - enum multi_stop_state curstate = MULTI_STOP_NONE;
> + enum multi_stop_state newstate, curstate = MULTI_STOP_NONE;
> int cpu = smp_processor_id(), err = 0;
> const struct cpumask *cpumask;
> unsigned long flags;
> @@ -210,8 +210,9 @@ static int multi_cpu_stop(void *data)
> do {
> /* Chill out and ensure we re-read multi_stop_state. */
> stop_machine_yield(cpumask);
> - if (msdata->state != curstate) {
> - curstate = msdata->state;
> + newstate = READ_ONCE(msdata->state);
> + if (newstate != curstate) {
> + curstate = newstate;
> switch (curstate) {
> case MULTI_STOP_DISABLE_IRQ:
> local_irq_disable();
>