Re: [PATCH 2/2 v2] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU

From: Peter Oskolkov
Date: Fri Aug 07 2020 - 14:47:55 EST


On Fri, Aug 7, 2020 at 11:25 AM Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> ----- On Aug 7, 2020, at 1:55 PM, Peter Oskolkov posk@xxxxxxx wrote:
>
> > On Thu, Aug 6, 2020 at 5:27 PM Boqun Feng <boqun.feng@xxxxxxxxx> wrote:
> [...]
> >> What if the manager thread update ->percpu_list_ptr and call
> >> membarrier() here? I.e.
> >>
> >> CPU0 CPU1
> >> list_ptr = atomic_load(&args->percpu_list_ptr); // read list_b
> >>
> >> atomic_store(&args->percpu_list_ptr, list_a);
> >> sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, 1); // send ipi to
> >> restart rseq.cs on CPU1
> >>
> >> <got IPI, but not in a rseq.cs, so nothing to do>
> >> cpu = rseq_cpu_start(); // start a rseq.cs and accessing list_b!
> >>
> >> The thing is, atomic_load() is an reference to ->percpu_list_ptr, which
> >> is outside the rseq.cs, simply restarting rseq doesn't kill this
> >> reference.
> >>
> >> Am I missing something subtle?
> >
> > rseq_cmpeqv_cmpeqv_store is used below to make sure the reference is
> > the one that should be used; if it is no longer "active", the
> > iteration is restarted.
>
> I suspect it "works" because the manager thread does not free and
> repurpose the memory where list_a is allocated, nor does it store to
> its list head (which would corrupt the pointer dereferenced by CPU 1
> in the scenario above). This shares similarities with type-safe memory
> allocation (see SLAB_TYPESAFE_BY_RCU).
>
> Even though it is not documented as such (or otherwise) in the example code,
> I feel this example looks like it guarantees that the manager thread "owns"
> list_a after the rseq-fence, when in fact it can still be read by the rseq
> critical sections.
>
> AFAIU moving the atomic_load(&args->percpu_list_ptr) into the critical section
> should entirely solve this and guarantee exclusive access to the old list
> after the manager's rseq-fence. I wonder why this simpler approach is not
> favored ?

I think the test code mimics our actual production code, where the concerns
you outlined are not particularly relevant. I'll see if the test can
be simplified
in v3 along the lines you suggested.

Thanks,
Peter

>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com