Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions

From: Christoph Lameter (Ampere)
Date: Wed May 08 2024 - 13:15:49 EST

Next message: Mikulas Patocka: "Re: [PATCH v18 12/21] dm: add finalize hook to target_type"
Previous message: Linus Torvalds: "Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes"
In reply to: Anshuman Khandual: "Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions"
Next in thread: Anshuman Khandual: "Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 8 May 2024, Anshuman Khandual wrote:

The atomic RMW instructions, for example, ldadd, actually does load +
add + store in one instruction, it may trigger two page faults, the
first fault is a read fault, the second fault is a write fault.

It may or it will definitely create two consecutive page faults. What
if the second write fault never came about. In that case an writable
page table entry would be created unnecessarily (or even wrongfully),
thus breaking the CoW.

An atomic RMV will always perform a write? If there is a read fault then write fault will follow.

Some applications use atomic RMW instructions to populate memory, for
example, openjdk uses atomic-add-0 to do pretouch (populate heap memory

But why cannot normal store operation is sufficient for pre-touching
the heap memory, why read-modify-write (RMW) is required instead ?

Sure a regular write operation is sufficient but you would have to modify existing applications to get that done. x86 does not do a read fault on atomics so we have an issue htere.

If the memory address has some valid data, it must have already reached there
via a previous write access, which would have caused initial CoW transition ?
If the memory address has no valid data to begin with, why even use RMW ?

Because the application can reasonably assume that all uninitialized data is zero and therefore it is not necessary to have a prior write access.

Some other architectures also have code inspection in page fault path,
for example, SPARC and x86.

Okay, I was about to ask, but is not calling get_user() for all data
read page faults increase the cost for a hot code path in general for
some potential savings for a very specific use case. Not sure if that
is worth the trade-off.

The instruction is cache hot since it must be present in the cpu cache for the fault. So the overhead is minimal.

Next message: Mikulas Patocka: "Re: [PATCH v18 12/21] dm: add finalize hook to target_type"
Previous message: Linus Torvalds: "Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes"
In reply to: Anshuman Khandual: "Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions"
Next in thread: Anshuman Khandual: "Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]