Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions

From: David Hildenbrand
Date: Tue May 14 2024 - 11:58:16 EST


On 14.05.24 12:39, Catalin Marinas wrote:
On Fri, May 10, 2024 at 10:13:02AM -0700, Yang Shi wrote:
On 5/10/24 5:11 AM, Catalin Marinas wrote:
On Tue, May 07, 2024 at 03:35:58PM -0700, Yang Shi wrote:
The atomic RMW instructions, for example, ldadd, actually does load +
add + store in one instruction, it may trigger two page faults, the
first fault is a read fault, the second fault is a write fault.

Some applications use atomic RMW instructions to populate memory, for
example, openjdk uses atomic-add-0 to do pretouch (populate heap memory
at launch time) between v18 and v22.
I'd also argue that this should be optimised in openjdk. Is an LDADD
more efficient on your hardware than a plain STR? I hope it only does
one operation per page rather than per long. There's also MAP_POPULATE
that openjdk can use to pre-fault the pages with no additional fault.
This would be even more efficient than any store or atomic operation.

It is not about whether atomic is more efficient than plain store on our
hardware or not. It is arch-independent solution used by openjdk.

It may be arch independent but it's not a great choice. If you run this
on pre-LSE atomics hardware (ARMv8.0), this operation would involve
LDXR+STXR and there's no way for the kernel to "upgrade" it to a write
operation on the first LDXR fault.

It would be good to understand why openjdk is doing this instead of a
plain write. Is it because it may be racing with some other threads
already using the heap? That would be a valid pattern.

Maybe openjdk should be switching to MADV_POPULATE_WRITE. QEMU did that for the preallocate/populate use case.

--
Cheers,

David / dhildenb