Re: [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
From: Dave Hansen
Date: Fri Jun 12 2026 - 12:07:00 EST
On 6/12/26 08:41, Vlastimil Babka (SUSE) wrote:
>>> If the vma lookup fails because the mmap write lock is held, but the vma
>>> actually exists (has not been unmapped), then this code might "successfully"
>>> remove the page without invoking zap_vma_range(). This means that the
>>> page does not actually get freed and will just hang around forever until
>>> the process owning the vma exits or Binder needs this page and maps a
>>> new page on top of the page.
>> Yeah, I think if lock_vma_under_rcu() returns NULL you just need to
>> jump to err_mmap_read_lock_failed, like we currently do if
>> mmap_read_trylock() fails.
> I don't think that will be enough as well, as the current code AFAICS does
> something meaninfgul when mmap_read_trylock() suceeds but vma_lookup returns
> NULL because there's no vma at that address. Now we would just assume the
> trylock failed even if the reason was that vma lookup found nothing for the
> address. The problem is that lock_vma_under_rcu() can't distinguish those
> two outcomes, so we would need something that does?
I spent way too much time staring at this yesterday.
I think the key to distinguishing between:
vma==NULL because there's no VMA
and
vma==NULL because of a trylock failure
is binder_alloc_is_mapped(). It won't return false until vm_ops->close()
finishes. vm_ops->close() shouldn't be able to happen while
lock_vma_under_rcu() is held. So if you've got a non-NULL VMA, you've
also got a stable is binder_alloc_is_mapped().
So, if you've got a vma!=NULL *and* binder_alloc_is_mapped()==true, I
think you can be pretty sure you've got the right VMA.
If you have vma==NULL and binder_alloc_is_mapped()==true, you can be
pretty sure that you hit some kind of transient lock_vma_under_rcu()
failure.
I came up with the attached patch. More eyeballs would be welcome.
There's a _lot_ going on here.
From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
tl;dr: lock_vma_under_rcu() is already a trylock. No need to do both
it and mmap_read_trylock().
Long Version:
== Background ==
Historically, binder used an mmap_read_trylock() in its shrinker code.
This ensures that reclaim is not blocked on an mmap_lock. Commit
95bc2d4a9020 ("binder: use per-vma lock in page reclaiming") added
support for the per-VMA lock, but left mmap_read_trylock() as a
fallback.
This fallback was required when per-VMA locking could fail
persistenty, but that is no longer the case.
== Problem ==
The fallback is not worth the complexity here. lock_vma_under_rcu() is
essentially already a non-blocking trylock. The main reason it fails
is also the reason mmap_read_trylock() fails: something is holding
mmap_write_lock().
The only remedy for a collision with mmap_write_lock() is to wait,
which this code can not do. So the "fallback" after
lock_vma_under_rcu() failure is not really a fallback: it is really
likely to just be retrying in vain. That retry in an of itself isn't
horrible. But it adds complexity.
== Solution ==
Now that per-VMA locks are universally available, lock_vma_under_rcu()
will not persistently fail. Rely on it alone and simplify the code.
The one wrinkle is that lock_vma_under_rcu() can return NULL even if
there is a VMA at 'page_addr'. But the later page zapping code
*must* run if the page might be mapped in to a VMA. Stop relying on
vma_lookup() for this. Just rely on binder_alloc_is_mapped().
== Discussion ==
I think there end up being four possible cases to handle. The first
two are straightforward. Note that "mapped" is shorthand for
binder_alloc_is_mapped().
!vma && !mapped: reclaim, no zap
vma && mapped: reclaim, with zap
The next one is arguably wrong in the code today:
vma && !mapped: Wrong VMA. Skip and retry.
It induces LRU_SKIP behavior from another VMA getting mapped. That
seems wrong. It is possible to continue this behavior, but it also
seems a bit silly to go to any lengths to keep doing it if it is
a bug.
The last case comes from normal lock_vma_under_rcu() behavior like
like overflows that is transient. It can _safely_ be handled by
LRU_SKIP. This case is new.
!vma && mapped: VMA lock race. Skip and retry.
Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
Cc: Lorenzo Stoakes <ljs@xxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx>
Cc: Vlastimil Babka <vbabka@xxxxxxxxxx>
Cc: Shakeel Butt <shakeel.butt@xxxxxxxxx>
Cc: linux-mm@xxxxxxxxx
Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Cc: Arve Hjønnevåg <arve@xxxxxxxxxxx>
Cc: Todd Kjos <tkjos@xxxxxxxxxxx>
Cc: Christian Brauner <christian@xxxxxxxxxx>
Cc: Carlos Llamas <cmllamas@xxxxxxxxxx>
Cc: Alice Ryhl <aliceryhl@xxxxxxxxxx>
Cc: "David S. Miller" <davem@xxxxxxxxxxxxx>
Cc: David Ahern <dsahern@xxxxxxxxxx>
Cc: netdev@xxxxxxxxxxxxxxx
---
Changes from v1:
* Move forward even if 'vma' is NULL in binder_alloc_free_page().
This can happen if the VMA is unmapped (Sashiko).
* Rename goto label to be more accurate for new lock scheme
Changes from v2:
* Remove review tags. There's too much churn in here to keep them.
* Rely on binder_alloc_free_page() instead of VMA lookups alone to
determine if the range must be zapped.
---
b/drivers/android/binder_alloc.c | 42 ++++++++++++++-------------------------
1 file changed, 16 insertions(+), 26 deletions(-)
diff -puN drivers/android/binder_alloc.c~binder-try-vma-lock drivers/android/binder_alloc.c
--- a/drivers/android/binder_alloc.c~binder-try-vma-lock 2026-06-10 15:57:55.274412018 -0700
+++ b/drivers/android/binder_alloc.c 2026-06-11 15:17:25.240473010 -0700
@@ -1142,7 +1142,7 @@ enum lru_status binder_alloc_free_page(s
struct vm_area_struct *vma;
struct page *page_to_free;
unsigned long page_addr;
- int mm_locked = 0;
+ bool mapped;
size_t index;
if (!mmget_not_zero(mm))
@@ -1151,26 +1151,21 @@ enum lru_status binder_alloc_free_page(s
index = mdata->page_index;
page_addr = alloc->vm_start + index * PAGE_SIZE;
- /* attempt per-vma lock first */
vma = lock_vma_under_rcu(mm, page_addr);
- if (!vma) {
- /* fall back to mmap_lock */
- if (!mmap_read_trylock(mm))
- goto err_mmap_read_lock_failed;
- mm_locked = 1;
- vma = vma_lookup(mm, page_addr);
- }
if (!mutex_trylock(&alloc->mutex))
- goto err_get_alloc_mutex_failed;
+ goto err_vma_end_read;
/*
- * Since a binder_alloc can only be mapped once, we ensure
- * the vma corresponds to this mapping by checking whether
- * the binder_alloc is still mapped.
+ * mapped==true means a VMA should be present. Any
+ * inconsistency should be transient. Skip the page
+ * and try again later.
*/
- if (vma && !binder_alloc_is_mapped(alloc))
- goto err_invalid_vma;
+ mapped = binder_alloc_is_mapped(alloc);
+ if (!vma && mapped)
+ goto err_vma_inconsistent;
+
+ /* mapped==true now implies a valid 'vma' */
trace_binder_unmap_kernel_start(alloc, index);
@@ -1182,32 +1177,27 @@ enum lru_status binder_alloc_free_page(s
list_lru_isolate(lru, item);
spin_unlock(&lru->lock);
- if (vma) {
+ if (mapped) {
trace_binder_unmap_user_start(alloc, index);
zap_vma_range(vma, page_addr, PAGE_SIZE);
trace_binder_unmap_user_end(alloc, index);
+
+ vma_end_read(vma);
}
mutex_unlock(&alloc->mutex);
- if (mm_locked)
- mmap_read_unlock(mm);
- else
- vma_end_read(vma);
mmput_async(mm);
binder_free_page(page_to_free);
return LRU_REMOVED_RETRY;
-err_invalid_vma:
+err_vma_inconsistent:
mutex_unlock(&alloc->mutex);
-err_get_alloc_mutex_failed:
- if (mm_locked)
- mmap_read_unlock(mm);
- else
+err_vma_end_read:
+ if (vma)
vma_end_read(vma);
-err_mmap_read_lock_failed:
mmput_async(mm);
err_mmget:
return LRU_SKIP;
_