Re: [PATCH 3/3] membarrier: Propagate SYNC_CORE and RSEQ actions more carefully

From: Mathieu Desnoyers
Date: Tue Dec 01 2020 - 09:29:36 EST

Next message: Sven Schnelle: "[PATCH v2 2/5] entry: rename exit_from_user_mode()"
Previous message: Dan Carpenter: "Re: [PATCH] media: rockchip: rkisp1: remove some dead code"
In reply to: Peter Zijlstra: "Re: [PATCH 3/3] membarrier: Propagate SYNC_CORE and RSEQ actions more carefully"
Next in thread: Andy Lutomirski: "Re: [PATCH 3/3] membarrier: Propagate SYNC_CORE and RSEQ actions more carefully"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

----- On Dec 1, 2020, at 5:16 AM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote:

> On Mon, Nov 30, 2020 at 09:50:35AM -0800, Andy Lutomirski wrote:
>> membarrier() carefully propagates SYNC_CORE and RSEQ actions to all
>> other CPUs, but there are two issues.
>>
>> - membarrier() does not sync_core() or rseq_preempt() the calling
>> CPU. Aside from the logic being mind-bending, this also means
>> that it may not be safe to modify user code through an alias,
>> call membarrier(), and then jump to a different executable alias
>> of the same code.
>
> I always understood this to be on purpose. The calling CPU can fix up
> itself just fine. The pain point is fixing up the other CPUs, and that's
> where membarrier() helps.

Indeed, as documented in the man page:

MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE (since Linux 4.16)
In addition to providing the memory ordering guarantees de‐
scribed in MEMBARRIER_CMD_PRIVATE_EXPEDITED, upon return from
system call the calling thread has a guarantee that all its run‐
ning thread siblings have executed a core serializing instruc‐
tion. This guarantee is provided only for threads in the same
process as the calling thread.

membarrier sync core guarantees a core serializing instruction on the siblings,
not on the caller thread. This has been done on purpose given that the caller
thread can always issue its core serializing instruction from user-space on
its own.

>
> That said, I don't mind including self, these aren't fast calls by any
> means.

I don't mind including self either, but this would require documentation
updates, including man pages, to state that starting from kernel Y this
is the guaranteed behavior. It's then tricky for user-space to query what
the behavior is unless we introduce a new membarrier command for it. So this
could introduce issues if software written for the newer kernels runs on older
kernels.

>
>> - membarrier() does not explicitly sync_core() remote CPUs either;
>> instead, it relies on the assumption that an IPI will result in a
>> core sync. On x86, I think this may be true in practice, but
>> it's not architecturally reliable. In particular, the SDM and
>> APM do not appear to guarantee that interrupt delivery is
>> serializing.
>
> Right, I don't think we rely on that, we do rely on interrupt delivery
> providing order though -- as per the previous email.
>
>> On a preemptible kernel, IPI return can schedule,
>> thereby switching to another task in the same mm that was
>> sleeping in a syscall. The new task could then SYSRET back to
>> usermode without ever executing IRET.
>
> This; I think we all overlooked this scenario.

Indeed, this is an issue which needs to be fixed.

>
>> This patch simplifies the code to treat the calling CPU just like
>> all other CPUs, and explicitly sync_core() on all target CPUs. This
>> eliminates the need for the smp_mb() at the end of the function
>> except in the special case of a targeted remote membarrier(). This
>> patch updates that code and the comments accordingly.

I am not confident that removing the smp_mb at the end of membarrier is
an appropriate change, nor that it simplifies the model.

This changes things from a model where we have a barrier at the beginning
and end of the membarrier system call, which nicely orders things happening
before/after the system call with respect to anything that is observed within
the system call (including the scheduler activity updating the runqueue's
current task), to a model where the memory barrier for the current thread
will be conditionally executed after we have sent the IPIs, and unconditionally
when issuing smp_call_function* on self.

About the documentation of the membarrier scenario, I think it is redundant
with a documentation patch I already have sitting in -tip (scenario A):

https://git.kernel.org/tip/25595eb6aaa9fbb31330f1e0b400642694bc6574

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Next message: Sven Schnelle: "[PATCH v2 2/5] entry: rename exit_from_user_mode()"
Previous message: Dan Carpenter: "Re: [PATCH] media: rockchip: rkisp1: remove some dead code"
In reply to: Peter Zijlstra: "Re: [PATCH 3/3] membarrier: Propagate SYNC_CORE and RSEQ actions more carefully"
Next in thread: Andy Lutomirski: "Re: [PATCH 3/3] membarrier: Propagate SYNC_CORE and RSEQ actions more carefully"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]