Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()

From: Nicholas Piggin
Date: Tue Dec 29 2020 - 21:41:19 EST


Excerpts from Russell King - ARM Linux admin's message of December 29, 2020 8:44 pm:
> On Tue, Dec 29, 2020 at 01:09:12PM +1000, Nicholas Piggin wrote:
>> I think it should certainly be documented in terms of what guarantees
>> it provides to application, _not_ the kinds of instructions it may or
>> may not induce the core to execute. And if existing API can't be
>> re-documented sanely, then deprecatd and new ones added that DTRT.
>> Possibly under a new system call, if arch's like ARM want a range
>> flush and we don't want to expand the multiplexing behaviour of
>> membarrier even more (sigh).
>
> The 32-bit ARM sys_cacheflush() is there only to support self-modifying
> code, and takes whatever actions are necessary to support that.
> Exactly what actions it takes are cache implementation specific, and
> should be of no concern to the caller, but the underlying thing is...
> it's to support self-modifying code.

Caveat
cacheflush() should not be used in programs intended to be portable.
On Linux, this call first appeared on the MIPS architecture, but nowa‐
days, Linux provides a cacheflush() system call on some other architec‐
tures, but with different arguments.

What a disaster. Another badly designed interface, although it didn't
originate in Linux it sounds like we weren't to be outdone so
we messed it up even worse.

flushing caches is neither necessary nor sufficient for code modification
on many processors. Maybe some old MIPS specific private thing was fine,
but certainly before it grew to other architectures, somebody should
have thought for more than 2 minutes about it. Sigh.

>
> Sadly, because it's existed for 20+ years, and it has historically been
> sufficient for other purposes too, it has seen quite a bit of abuse
> despite its design purpose not changing - it's been used by graphics
> drivers for example. They quickly learnt the error of their ways with
> ARMv6+, since it does not do sufficient for their purposes given the
> cache architectures found there.
>
> Let's not go around redesigning this after twenty odd years, requiring
> a hell of a lot of pain to users. This interface is called by code
> generated by GCC, so to change it you're looking at patching GCC as
> well as the kernel, and you basically will make new programs
> incompatible with older kernels - very bad news for users.

For something to be redesigned it had to have been designed in the first
place, so there is no danger of that don't worry... But no I never
suggested making incompatible changes to any existing system call, I
said "re-documented". And yes I said deprecated but in Linux that really
means kept indefinitely.

If ARM, MIPS, 68k etc programs and toolchains keep using what they are
using it'll keep working no problem.

The point is we're growing new interfaces, and making the same mistakes.
It's not portable (ARCH_HAS_MEMBARRIER_SYNC_CORE), it's also specified
in terms of low level processor operations rather than higher level
intent, and also is not sufficient for self-modifying code (without
additional cache flush on some processors).

The application wants a call that says something like "memory modified
before the call will be visible as instructions (including illegal
instructions) by all threads in the program after the system call
returns, and no threads will be subject to any effects of executing the
previous contents of that memory.

So I think the basics are simple (although should confirm with some JIT
and debugger etc developers, and not just Android mind you). There are
some complications in details, address ranges, virtual/physical, thread
local vs process vs different process or system-wide, memory ordering
and propagation of i and d sides, etc. But that can be worked through,
erring on the side of sanity rather than pointless micro-optmisations.

Thanks,
Nick