Re: [PATCH] KVM: selftests: Double check on the current CPU in rseq_test

From: Sean Christopherson
Date: Thu Jul 14 2022 - 11:35:29 EST


On Thu, Jul 14, 2022, Paolo Bonzini wrote:
> On 7/14/22 10:06, Gavin Shan wrote:
> > In rseq_test, there are two threads created. Those two threads are
> > 'main' and 'migration_thread' separately. We also have the assumption
> > that non-migration status on 'migration-worker' thread guarantees the
> > same non-migration status on 'main' thread. Unfortunately, the assumption
> > isn't true. The 'main' thread can be migrated from one CPU to another
> > one between the calls to sched_getcpu() and READ_ONCE(__rseq.cpu_id).
> > The following assert is raised eventually because of the mismatched
> > CPU numbers.
> >
> > The issue can be reproduced on arm64 system occasionally.
>
> Hmm, this does not seem a correct patch - the threads are already
> synchronizing using seq_cnt, like this:
>
> migration main
> ---------------------- --------------------------------
> seq_cnt = 1
> smp_wmb()
> snapshot = 0
> smp_rmb()
> cpu = sched_getcpu() reads 23
> sched_setaffinity()
> rseq_cpu = __rseq.cpuid reads 35
> smp_rmb()
> snapshot != seq_cnt -> retry
> smp_wmb()
> seq_cnt = 2
>
> sched_setaffinity() is guaranteed to block until the task is enqueued on an
> allowed CPU.

Yes, and retrying could suppress detection of kernel bugs that this test is intended
to catch.

> Can you check that smp_rmb() and smp_wmb() generate correct instructions on
> arm64?

That seems like the most likely scenario (or a kernel bug), I distinctly remember
the barriers provided by tools/ being rather bizarre.