Re: [v5 PATCH] arm64: mm: force write fault for atomic RMW instructions

From: Christoph Lameter (Ampere)
Date: Fri Jul 05 2024 - 14:51:43 EST


On Fri, 5 Jul 2024, Catalin Marinas wrote:

There's nothing about arm64 in there and it looks like the code prefers
MADV_POPULATE_WRITE if THPs are enabled (which is the case in all
enterprise distros). I can't tell whether the change was made to work
around the arm64 behaviour, there's no commit log (it was contributed by
Ampere).

It took us a long time and numerous developers and QA teams to get to this insight. You dont want to replicate this for other applications.

There's a separate thread with the mm folk on the THP behaviour for
pmd_none() vs pmd mapping the zero huge page but it is more portable for
OpenJDK to use madvise() than guess the kernel behaviour and touch small
pages or a single large pages. Even if one claims that atomic_add(0) is
portable across operating systems, the OpenJDK code was already treating
Linux as a special case in the presence of THP.

Other apps do not have such a vibrant developer community and no ampere employees contributing. They will never know and just say ARM has bad performance.


It would be much simpler to just merge the patch and be done with it.
Otherwise this issue will continue to cause uncountably many hours of
anguish for sysadmins and developers all over the Linux ecosystem trying to
figure out what in the world is going on with ARM.

People will be happy until one enables execute-only ELF text sections in
a distro and all that opcode parsing will add considerable overhead for
many read faults (those with a writeable vma).

The opcode is in the l1 cache since we just faulted on it. There is no "considerable" overhead.

I'd also like to understand (probably have to re-read the older threads)
whether the overhead is caused mostly by the double fault or the actual
breaking of a THP. For the latter, the mm folk are willing to change the
behaviour so that pmd_none() and pmd to the zero high page are treated
similarly (i.e. allocate a huge page on write fault). If that's good
enough, I'd rather not merge this patch (or some form of it) and wait
for a proper fix in hardware in the future.

THP is secondary effect here. Note that similar approaches have been implemented for other architectures. This is not a new approach and the approach is widely used on other platforms.

If those on other Linux platforms encounter this strange discussion here then they would come to the same conclusion that I have.

Just to be clear, there are still potential issues to address (or
understand the impact of) in this patch with exec-only mappings and
the performance gain _after_ the THP behaviour changed in the mm code.
We can make a call once we have more data but, TBH, my inclination is
towards 'no' given that OpenJDK already support madvise() and it's not
arm64 specific.

It is arm64 specific. Other Linux architectures have optimizations for similar issues in their arch code as mentioned in the patch or the processors will not double fault.

Is there a particular reason for ARM as processor manufacturer to oppose this patch? We have mostly hand waving and speculations coming from you here.

What the patch does is clearly beneficial and it is an established way of implementing read->write fault handling.