The atomic RMW instructions, for example, ldadd, actually does load +
add + store in one instruction, it may trigger two page faults, the
first fault is a read fault, the second fault is a write fault.
It may or it will definitely create two consecutive page faults. What
if the second write fault never came about. In that case an writable
page table entry would be created unnecessarily (or even wrongfully),
thus breaking the CoW.
Some applications use atomic RMW instructions to populate memory, for
example, openjdk uses atomic-add-0 to do pretouch (populate heap memory
But why cannot normal store operation is sufficient for pre-touching
the heap memory, why read-modify-write (RMW) is required instead ?
If the memory address has some valid data, it must have already reached there
via a previous write access, which would have caused initial CoW transition ?
If the memory address has no valid data to begin with, why even use RMW ?
Some other architectures also have code inspection in page fault path,
for example, SPARC and x86.
Okay, I was about to ask, but is not calling get_user() for all data
read page faults increase the cost for a hot code path in general for
some potential savings for a very specific use case. Not sure if that
is worth the trade-off.