Re: [v5 PATCH] arm64: mm: show direct mapping use in /proc/meminfo

From: Yang Shi

Date: Mon Jan 26 2026 - 12:55:19 EST




On 1/26/26 6:14 AM, Will Deacon wrote:
On Thu, Jan 22, 2026 at 01:59:54PM -0800, Yang Shi wrote:
On 1/22/26 6:43 AM, Ryan Roberts wrote:
On 21/01/2026 22:44, Yang Shi wrote:
On 1/21/26 9:23 AM, Ryan Roberts wrote:
But it looks like all the higher level users will only ever unplug in the same
granularity that was plugged in (I might be wrong but that's the sense I get).

arm64 adds the constraint that it won't unplug any memory that was present at
boot - see prevent_bootmem_remove_notifier().

So in practice this is probably safe, though perhaps brittle.

Some options:

- leave it as is and worry about it if/when something shifts and hits the
problem.
Seems like the most simple way :-)

- Enhance prevent_bootmem_remove_notifier() to reject unplugging memory blocks
whose boundaries are within leaf mappings.
I don't quite get why we should enhance prevent_bootmem_remove_notifier().
If I read the code correctly, it just simply reject offline boot memory.
Offlining a single memory block is fine. If you check the boundaries there,
will it prevent from offlining a single memory block?

I think you need enhance try_remove_memory(). But kernel may unmap linear
mapping by memory blocks if altmap is used. So you should need an extra page
table walk with the start and the size of unplugged dimm before removing the
memory to tell whether the boundaries are within leaf mappings or not IIUC.
Can it be done in arch_remove_memory()? It seems not because
arch_remove_memory() may be called on memory block granularity if altmap is
used.

- For non-bbml2_noabort systems, map hotplug memory with a new flag to ensure
that leaf mappings are always <= memory_block_size_bytes(). For
bbml2_noabort, split at the block boundaries before doing the unmapping.
The linear mapping will be at most 128M (4K page size), it sounds sub
optimal IMHO.

Given I don't think this can happen in practice, probably the middle option is
the best? There is no runtime impact and it will give us a warning if it ever
does happen in future.

What do you think?
I agree it can't happen in practice, so why not just take option #1 given
the complexity added by option #2?
It still looks broken in the case that a region that was mapped with the
contiguous bit is then unmapped. The sequence seems to iterate over
each contiguous PTE, zapping the entry and doing the TLBI while the
other entries in the contiguous range remain intact. I don't think
that's sufficient to guarantee that you don't have stale TLB entries
once you've finished processing the whole range.

For example, imagine you have an L1 TLB that only supports 4k entries
and an L2 TLB that supports 64k entries. Let's say that the contiguous
range is mapped by pte0 ... pte15 and we've zapped and invalidated
pte0 ... pte14. At that point, I think the hardware is permitted to use
the last remaining contiguous pte (pte15) to allocate a 64k entry in the
L2 TLB covering the whole range. A (speculative) walk via one of the
virtual addresses translated by pte0 ... pte14 could then hit that entry
and fill a 4k entry into the L1 TLB. So at the end of the sequence, you
could presumably still access the first 60k of the range thanks to stale
entries in the L1 TLB?

It is a little bit hard for me to understand how come a (speculative) walk could happen when we reach here.

Before we reach here, IIUC kernel has:

 * offlined all the page blocks. It means they are freed and isolated from buddy allocator, even pfn walk (for example, compaction) should not reach them at all.
 * vmemmap has been eliminated. So no struct page available.

From kernel point of view, they are nonreachable now. Did I miss and/or misunderstand something?

Thanks,
Yang


So it looks broken to me. What do you think? If you agree, then let's
fix this problem first before adding the new /proc/meminfo stuff.

Will