Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions

From: Yang Shi
Date: Fri May 17 2024 - 13:35:41 EST




On 5/17/24 10:25 AM, Catalin Marinas wrote:
On Fri, May 17, 2024 at 09:30:23AM -0700, Yang Shi wrote:
On 5/14/24 3:39 AM, Catalin Marinas wrote:
It would be good to understand why openjdk is doing this instead of a
plain write. Is it because it may be racing with some other threads
already using the heap? That would be a valid pattern.
Yes, you are right. I think I quoted the JVM justification in earlier email,
anyway they said "permit use of memory concurrently with pretouch".
Ah, sorry, I missed that. This seems like a valid reason.

I should have articulated this in the commit log. Will add this in v2.


A point Will raised was on potential ABI changes introduced by this
patch. The ESR_EL1 reported to user remains the same as per the hardware
spec (read-only), so from a SIGSEGV we may have some slight behaviour
changes:

1. PTE invalid:

a) vma is VM_READ && !VM_WRITE permission - SIGSEGV reported with
ESR_EL1.WnR == 0 in sigcontext with your patch. Without this
patch, the PTE is mapped as PTE_RDONLY first and a subsequent
fault will report SIGSEGV with ESR_EL1.WnR == 1.
I think I can do something like the below conceptually:

if is_el0_atomic_instr && !is_write_abort
    force_write = true

if VM_READ && !VM_WRITE && force_write == true
Nit: write implies read, so you only need to check !write.

    vm_flags = VM_READ
    mm_flags ~= FAULT_FLAG_WRITE

Then we just fallback to read fault. The following write fault will trigger
SIGSEGV with consistent ABI.
I think this should work. So instead of reporting the write fault
directly in case of a read-only vma, we let the core code handle the
read fault and first and we retry the atomic instruction.

Yes, just undo the force write when vma flags don't allow it.


b) vma is !VM_READ && !VM_WRITE permission - SIGSEGV reported with
ESR_EL1.WnR == 0, so no change from current behaviour, unless we
fix the patch for (1.a) to fake the WnR bit which would change the
current expectations.

2. PTE valid with PTE_RDONLY - we get a normal writeable fault in
hardware, no need to fix ESR_EL1 up.

The patch would have to address (1) above but faking the ESR_EL1.WnR bit
based on the vma flags looks a bit fragile.
I think we don't need to fake the ESR_EL1.WnR bit with the fallback.
I agree, with your approach above we don't need to fake WnR.

Similarly, we have userfaultfd that reports the fault to user. I think
in scenario (1) the kernel will report UFFD_PAGEFAULT_FLAG_WRITE with
your patch but no UFFD_PAGEFAULT_FLAG_WP. Without this patch, there are
indeed two faults, with the second having both UFFD_PAGEFAULT_FLAG_WP
and UFFD_PAGEFAULT_FLAG_WRITE set.
I don't quite get what the problem is. IIUC, uffd just needs a signal from
kernel to tell this area will be written. It seems not break the semantic.
Added Peter Xu in this loop, who is the uffd developer. He may shed some
light.
Not really familiar with uffd but just looking at the code, if a handler
is registered for both MODE_MISSING and MODE_WP, currently the atomic
instruction signals a user fault without UFFD_PAGEFAULT_FLAG_WRITE (the
do_anonymous_page() path). If the page is mapped by the uffd handler as
the zero page, a restart of the instruction would signal
UFFD_PAGEFAULT_FLAG_WRITE and UFFD_PAGEFAULT_FLAG_WP (the do_wp_page()
path).

With your patch, we get the equivalent of UFFD_PAGEFAULT_FLAG_WRITE on
the first attempt, just like having a STR instruction instead of
separate LDR + STR (as the atomics behave from a fault perspective).

However, I don't think that's a problem, the uffd handler should cope
with an STR anyway, so it's not some unexpected combination of flags.

Yes, this is what I thought.