Re: [PATCH 0/4] arm64: Support the TSO memory model

From: Hector Martin
Date: Thu Apr 11 2024 - 14:45:28 EST




On 2024/04/11 23:19, Hector Martin wrote:
>>
>> An alternative option is to go down the SPARC RMO route and just enable
>> TSO statically (although presumably in the firmware) for Apple silicon.
>> I'm assuming that has a performance impact for native code?
>
> Correct. We already have this as a bootloader option, but it is not
> desirable. Plus, userspace code still needs a way to *discover* that TSO
> is enabled for correctness, so it can automatically decide whether to
> use stronger or weaker instructions.

To add some numbers to this (I was just made aware of this paper):

https://www.sra.uni-hannover.de/Publications/2023/tosting-arcs23/wrenger_23_arcs.pdf

Using TSO globally has, on average, a 9% performance hit, so that is
clearly off the table as a general solution.

Meanwhile, more detailed microbenchmarks often show TSO as having better
performance than outright using acquire/release instructions without
TSO. Therefore, just giving up on TSO and using acq/rel semantics for
emulators is also not an acceptable solution.

Additionally, the general load/store instructions on ARM have more
flexible addressing modes than the synchronizing ones, and since general
x86 emulation requires *all* loads and stores to be like this in a
non-TSO model (without much more complex/expensive program analysis to
determine where this can be elided), the perf impact is definitely worse
for emulation (e.g. stack accesses are affected) than for a
microbenchmark where only the "target" test instructions are being modified.

- Hector