Re: [PATCH 7/8] membarrier: Remove arm (32) support for SYNC_CORE

From: Mark Rutland
Date: Thu Jun 17 2021 - 06:41:02 EST


On Tue, Jun 15, 2021 at 08:21:12PM -0700, Andy Lutomirski wrote:
> On arm32, the only way to safely flush icache from usermode is to call
> cacheflush(2). This also handles any required pipeline flushes, so
> membarrier's SYNC_CORE feature is useless on arm. Remove it.

Unfortunately, it's a bit more complicated than that, and these days
SYNC_CORE is equally necessary on arm as on arm64. This is something
that changed in the architecture over time, but since ARMv7 we generally
need both the cache maintenance *and* a context synchronization event
(the latter must occur on the CPU which will execute the instructions).

If you look at the latest ARMv7-AR manual (ARM DDI 406C.d), section
A3.5.4 "Concurrent modification and execution of instructions" covers
this. That manual can be found at:

https://developer.arm.com/documentation/ddi0406/latest/

Likewise for ARMv8-A; the latest manual (ARM DDI 0487G.a) covers this in
sections B2.2.5 and E2.3.5. That manual can be found at:

https://developer.arm.com/documentation/ddi0487/ga

I am not sure about exactly what's required 11MPcore, since that's
somewhat a special case as the only SMP design prior to ARMv7-A
mandating broadcast maintenance.

For intuition's sake, one reason for this is that once a CPU has fetched
an instruction from an instruction cache into its pipeline and that
instruction is "in-flight", changes to that instruction cache are not
guaranteed to affect the "in-flight" copy (which e.g. could be
decomposed into micro-ops and so on). While these parts of a CPU aren't
necessarily designed as caches, they effectively transiently cache a
stale copy of the instruction while it is being executed.

This is more pronounced on newer designs with more complex execution
pipelines (e.g. with bigger windows for out-of-order execution and
speculation), and generally it's unlikely for this to be noticed on
smaller/simpler designs.

As above, modifying instructions requires two things:

1) Making sure that *subsequent* instruction fetches will see the new
instructions. This is what cacheflush(2) does, and this is similar to
what SW does on arm64 with DC CVAU + IC IVAU instructions and
associated memory barriers.

2) Making sure that a CPU fetches the instructions *after* the cache
maintenance is complete. There are a few ways to do this:

* A context synchronization event (e.g. an ISB or exception return)
on the CPU that will execute the instructions. This is what
membarrier(SYNC_CORE) does.

* In ARMv8-A there are some restrictions on the order in which
modified instructions are guaranteed to be observed (e.g. if you
publish a function, then subsequently install a branch to that new
function), where an ISB may not be necessary. In the latest ARMv8-A
manual as linked above, those are described in sections:

- B2.3.8 "Ordering of instruction fetches" (for 64-bit)
- E2.3.8 "Ordering of instruction fetches" (for 32-bit)

* Where we can guarantee that a CPU cannot possibly have an
instruction in-flight (e.g. due to a lack of a mapping to fetch
instructions from), nothing is necessary. This is what we rely on
when faulting in code pages. In these cases, the CPU is liable to
take fault on the missing translation anyway.

Thanks,
Mark.

>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> Cc: Nicholas Piggin <npiggin@xxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Russell King <linux@xxxxxxxxxxxxxxx>
> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> ---
> arch/arm/Kconfig | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 24804f11302d..89a885fba724 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -10,7 +10,6 @@ config ARM
> select ARCH_HAS_FORTIFY_SOURCE
> select ARCH_HAS_KEEPINITRD
> select ARCH_HAS_KCOV
> - select ARCH_HAS_MEMBARRIER_SYNC_CORE
> select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
> select ARCH_HAS_PHYS_TO_DMA
> --
> 2.31.1
>