Re: [PATCH] mm/userfaultfd: Fix release hang over concurrent GUP

From: Andrew Morton
Date: Wed Mar 12 2025 - 18:04:18 EST

Next message: Bjorn Helgaas: "Re: [PATCH v10 08/10] PCI: dwc: Print warning message when cpu_addr_fixup() exists"
Previous message: Andrew Morton: "Re: [PATCH v10 19/21] mm: Add vmalloc_huge_node()"
In reply to: Peter Xu: "[PATCH] mm/userfaultfd: Fix release hang over concurrent GUP"
Next in thread: Peter Xu: "Re: [PATCH] mm/userfaultfd: Fix release hang over concurrent GUP"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 12 Mar 2025 10:51:31 -0400 Peter Xu <peterx@xxxxxxxxxx> wrote:

> This patch should fix a possible userfaultfd release() hang during
> concurrent GUP.
>
> This problem was initially reported by Dimitris Siakavaras in July 2023 [1]
> in a firecracker use case. Firecracker has a separate process handling
> page faults remotely, and when the process releases the userfaultfd it can
> race with a concurrent GUP from KVM trying to fault in a guest page during
> the secondary MMU page fault process.
>
> A similar problem was reported recently again by Jinjiang Tu in March 2025
> [2], even though the race happened this time with a mlockall() operation,
> which does GUP in a similar fashion.
>
> In 2017, commit 656710a60e36 ("userfaultfd: non-cooperative: closing the
> uffd without triggering SIGBUS") was trying to fix this issue. AFAIU, that
> fixes well the fault paths but may not work yet for GUP. In GUP, the issue
> is NOPAGE will be almost treated the same as "page fault resolved" in
> faultin_page(), then the GUP will follow page again, seeing page missing,
> and it'll keep going into a live lock situation as reported.
>
> This change makes core mm return RETRY instead of NOPAGE for both the GUP
> and fault paths, proactively releasing the mmap read lock. This should
> guarantee the other release thread make progress on taking the write lock
> and avoid the live lock even for GUP.
>
> When at it, rearrange the comments to make sure it's uptodate.

It would be good to have a Fixes: target but this bug seems to be so
old that a bare cc:stable should be OK?

Next message: Bjorn Helgaas: "Re: [PATCH v10 08/10] PCI: dwc: Print warning message when cpu_addr_fixup() exists"
Previous message: Andrew Morton: "Re: [PATCH v10 19/21] mm: Add vmalloc_huge_node()"
In reply to: Peter Xu: "[PATCH] mm/userfaultfd: Fix release hang over concurrent GUP"
Next in thread: Peter Xu: "Re: [PATCH] mm/userfaultfd: Fix release hang over concurrent GUP"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]