Re: [PATCH 0/4] arm64: Support the TSO memory model
From: Will Deacon
Date: Thu Apr 11 2024 - 09:29:21 EST
Hi Hector,
On Thu, Apr 11, 2024 at 09:51:19AM +0900, Hector Martin wrote:
> x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this
> reason, x86 emulation on baseline ARM64 systems requires very expensive
> memory model emulation. Having hardware that supports this natively is
> therefore very attractive. Such hardware, in fact, exists. This series
> adds support for userspace to identify when TSO is available and
> toggle it on, if supported.
I'm probably going to make myself hugely unpopular here, but I have a
strong objection to this patch series as it stands. I firmly believe
that providing a prctl() to query and toggle the memory model to/from
TSO is going to lead to subtle fragmentation of arm64 Linux userspace.
It's not difficult to envisage this TSO switch being abused for native
arm64 applications:
* A program no longer crashes when TSO is enabled, so the developer
just toggles TSO to meet a deadline.
* Some legacy x86 sources are being ported to arm64 but concurrency
is hard so the developer just enables TSO to (mostly) avoid thinking
about it.
* Some binaries in a distribution exhibit instability which goes away
in TSO mode, so a taskset-like program is used to run them with TSO
enabled.
In all these cases, we end up with native arm64 applications that will
either fail to load or will crash in subtle ways on CPUs without the TSO
feature. Assuming that the application cannot be fixed, a better
approach would be to recompile using stronger instructions (e.g.
LDAR/STLR) so that at least the resulting binary is portable. Now, it's
true that some existing CPUs are TSO by design (this is a perfectly
valid implementation of the arm64 memory model), but I think there's a
big difference between quietly providing more ordering guarantees than
software may be relying on and providing a mechanism to discover,
request and ultimately rely upon the stronger behaviour.
An alternative option is to go down the SPARC RMO route and just enable
TSO statically (although presumably in the firmware) for Apple silicon.
I'm assuming that has a performance impact for native code?
Will
P.S. I briefly pondered the idea of the kernel toggling the bit in the
ELF loader when e.g. it sees an x86 machine type but I suspect that
doesn't really help with existing emulators and you'd still need a way
to tell the emulator whether or not it was enabled.