Re: [PATCH v2 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap

From: David Hildenbrand
Date: Fri Jun 27 2025 - 11:51:15 EST


On 27.06.25 17:29, Lance Yang wrote:


On 2025/6/27 18:13, David Hildenbrand wrote:
On 27.06.25 09:36, Barry Song wrote:
On Fri, Jun 27, 2025 at 7:15 PM Lance Yang <lance.yang@xxxxxxxxx> wrote:



On 2025/6/27 14:55, Barry Song wrote:
On Fri, Jun 27, 2025 at 6:52 PM Barry Song <21cnbao@xxxxxxxxx> wrote:

On Fri, Jun 27, 2025 at 6:23 PM Lance Yang <ioworker0@xxxxxxxxx>
wrote:

From: Lance Yang <lance.yang@xxxxxxxxx>

As pointed out by David[1], the batched unmap logic in
try_to_unmap_one()
can read past the end of a PTE table if a large folio is mapped
starting at
the last entry of that table. It would be quite rare in practice, as
MADV_FREE typically splits the large folio ;)

So let's fix the potential out-of-bounds read by refactoring the
logic into
a new helper, folio_unmap_pte_batch().

The new helper now correctly calculates the safe number of pages
to scan by
limiting the operation to the boundaries of the current VMA and
the PTE
table.

In addition, the "all-or-nothing" batching restriction is removed to
support partial batches. The reference counting is also cleaned up
to use
folio_put_refs().

[1] https://lore.kernel.org/linux-mm/
a694398c-9f03-4737-81b9-7e49c857fcbe@xxxxxxxxxx


What about ?

As pointed out by David[1], the batched unmap logic in
try_to_unmap_one()
may read past the end of a PTE table when a large folio spans
across two PMDs,
particularly after being remapped with mremap(). This patch fixes the
potential out-of-bounds access by capping the batch at vm_end and
the PMD
boundary.

It also refactors the logic into a new helper,
folio_unmap_pte_batch(),
which supports batching between 1 and folio_nr_pages. This improves
code
clarity. Note that such cases are rare in practice, as MADV_FREE
typically
splits large folios.

Sorry, I meant that MADV_FREE typically splits large folios if the
specified
range doesn't cover the entire folio.

Hmm... I got it wrong as well :( It's the partial coverage that triggers
the split.

how about this revised version:

As pointed out by David[1], the batched unmap logic in
try_to_unmap_one()
may read past the end of a PTE table when a large folio spans across two
PMDs, particularly after being remapped with mremap(). This patch fixes
the potential out-of-bounds access by capping the batch at vm_end and
the
PMD boundary.

It also refactors the logic into a new helper, folio_unmap_pte_batch(),
which supports batching between 1 and folio_nr_pages. This improves code
clarity. Note that such boundary-straddling cases are rare in
practice, as
MADV_FREE will typically split a large folio if the advice range does
not
cover the entire folio.

I assume the out-of-bounds access must be fixed, even though it is very
unlikely. It might occur after a large folio is marked with MADV_FREE and
then remapped to an unaligned address, potentially crossing two PTE
tables.

Right. If it can be triggered from userspace, it doesn't matter how
likely/common/whatever it is. It must be fixed.

Agreed. It must be fixed regardless of how rare the scenario is ;)



A batch size between 2 and nr_pages - 1 is practically rare, as we
typically
split large folios when MADV_FREE does not cover the entire folio range.
Cases where a batch of size 2 or nr_pages - 1 occurs may only happen if a
large folio is partially unmapped after being marked MADV_FREE, which is
quite an unusual pattern in userspace.

I think the point is rather "Simplify the code by not special-casing for
completely mapped folios, there is no real reason why we cannot batch
ranges that don't cover the complete folio.".

Yeah. That makes the code cleaner and more generic, as there is no
strong reason to special-case for fully mapped folios ;)

Based on that, I think we're on the same page now. I'd like to post
the following commit message for the next version:

```
As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
may read past the end of a PTE table when a large folio's PTE mappings
are not fully contained within a single page table.

While this scenario might be rare, an issue triggerable from userspace must
be fixed regardless of its likelihood. This patch fixes the out-of-bounds
access by refactoring the logic into a new helper, folio_unmap_pte_batch().

The new helper correctly calculates the safe batch size by capping the
scan at both the VMA and PMD boundaries. To simplify the code, it also
supports partial batching (i.e., any number of pages from 1 up to the
calculated safe maximum), as there is no strong reason to special-case
for fully mapped folios.
```

So, wdyt?

Sounds good to me.

--
Cheers,

David / dhildenb