Re: [v5 PATCH] arm64: mm: show direct mapping use in /proc/meminfo

From: Yang Shi

Date: Tue Jan 20 2026 - 19:17:49 EST

Hi Will,

Gently ping, does the proposed change solve your concerns? If they do, I can send v6, hopefully it can be merged in the coming merge window.

Thanks,
Yang

On 1/13/26 4:36 PM, Yang Shi wrote:

On 1/13/26 6:36 AM, Will Deacon wrote:

On Tue, Jan 06, 2026 at 04:29:44PM -0800, Yang Shi wrote:

Since commit a166563e7ec3 ("arm64: mm: support large block mapping when
rodata=full"), the direct mapping may be split on some machines instead
keeping static since boot. It makes more sense to show the direct mapping
use in /proc/meminfo than before.
This patch will make /proc/meminfo show the direct mapping use like the
below (4K base page size):
DirectMap4K:       94792 kB
DirectMap64K:      134208 kB
DirectMap2M:     1173504 kB
DirectMap32M:     5636096 kB
DirectMap1G:    529530880 kB

Although just the machines which support BBML2_NOABORT can split the
direct mapping, show it on all machines regardless of BBML2_NOABORT so
that the users have consistent view in order to avoid confusion.

Although ptdump also can tell the direct map use, but it needs to dump
the whole kernel page table. It is costly and overkilling. It is also
in debugfs which may not be enabled by all distros. So showing direct
map use in /proc/meminfo seems more convenient and has less overhead.

Signed-off-by: Yang Shi <yang@xxxxxxxxxxxxxxxxxxxxxx>
---
v5: * Rebased to v6.19-rc4
     * Fixed the build error for !CONFIG_PROC_FS
v4: * Used PAGE_END instead of _PAGE_END(VA_BITS_MIN) per Ryan
     * Used shorter name for the helpers and variables per Ryan
     * Fixed accounting for memory hotunplug
v3: * Fixed the over-accounting problems per Ryan
     * Introduced helpers for add/sub direct map use and #ifdef them with
       CONFIG_PROC_FS per Ryan
     * v3 is a fix patch on top of v2
v2: * Counted in size instead of the number of entries per Ryan
     * Removed shift array per Ryan
     * Use lower case "k" per Ryan
     * Fixed a couple of build warnings reported by kernel test robot
     * Fixed a couple of poential miscounts

arch/arm64/mm/mmu.c | 202 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 181 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 8e1d80a7033e..422441c9a992 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -29,6 +29,7 @@
#include <linux/mm_inline.h>
#include <linux/pagewalk.h>
#include <linux/stop_machine.h>
+#include <linux/proc_fs.h>
#include <asm/barrier.h>
#include <asm/cputype.h>
@@ -171,6 +172,85 @@ static void init_clear_pgtable(void *table)
      dsb(ishst);
}
+enum dm_type {
+    PTE,
+    CONT_PTE,
+    PMD,
+    CONT_PMD,
+    PUD,
+    NR_DM_TYPE,
+};
+
+#ifdef CONFIG_PROC_FS
+static unsigned long dm_meminfo[NR_DM_TYPE];
+
+void arch_report_meminfo(struct seq_file *m)
+{
+    char *size[NR_DM_TYPE];

const?

Yeah, it can be const.

+
+#if defined(CONFIG_ARM64_4K_PAGES)
+    size[PTE] = "4k";
+    size[CONT_PTE] = "64k";
+    size[PMD] = "2M";
+    size[CONT_PMD] = "32M";
+    size[PUD] = "1G";
+#elif defined(CONFIG_ARM64_16K_PAGES)
+    size[PTE] = "16k";
+    size[CONT_PTE] = "2M";
+    size[PMD] = "32M";
+    size[CONT_PMD] = "1G";
+#elif defined(CONFIG_ARM64_64K_PAGES)
+    size[PTE] = "64k";
+    size[CONT_PTE] = "2M";
+    size[PMD] = "512M";
+    size[CONT_PMD] = "16G";
+#endif
+
+    seq_printf(m, "DirectMap%s:    %8lu kB\n",
+            size[PTE], dm_meminfo[PTE] >> 10);
+    seq_printf(m, "DirectMap%s:    %8lu kB\n",
+            size[CONT_PTE],
+            dm_meminfo[CONT_PTE] >> 10);
+    seq_printf(m, "DirectMap%s:    %8lu kB\n",
+            size[PMD], dm_meminfo[PMD] >> 10);
+    seq_printf(m, "DirectMap%s:    %8lu kB\n",
+            size[CONT_PMD],
+            dm_meminfo[CONT_PMD] >> 10);
+    if (pud_sect_supported())
+        seq_printf(m, "DirectMap%s:    %8lu kB\n",
+            size[PUD], dm_meminfo[PUD] >> 10);

This seems a bit brittle to me. If somebody adds support for l1 block
mappings for !4k pages in future, they will forget to update this and
we'll end up returning kernel stack in /proc/meminfo afaict.

I can initialize size[PUD] to "NON_SUPPORT" by default. If the case happens, /proc/meminfo just shows "DirectMapNON_SUPPORT", then we will notice something is missed, but no kernel stack data will be leak.

+static inline bool is_dm_addr(unsigned long addr)
+{
+    return (addr >= PAGE_OFFSET) && (addr < PAGE_END);
+}
+
+static inline void dm_meminfo_add(unsigned long addr, unsigned long size,
+                  enum dm_type type)
+{
+    if (is_dm_addr(addr))
+        dm_meminfo[type] += size;
+}
+
+static inline void dm_meminfo_sub(unsigned long addr, unsigned long size,
+                  enum dm_type type)
+{
+    if (is_dm_addr(addr))
+        dm_meminfo[type] -= size;
+}
+#else
+static inline void dm_meminfo_add(unsigned long addr, unsigned long size,
+                  enum dm_type type)
+{
+}
+
+static inline void dm_meminfo_sub(unsigned long addr, unsigned long size,
+                  enum dm_type type)
+{
+}
+#endif
+
static void init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
               phys_addr_t phys, pgprot_t prot)
{
@@ -236,6 +316,11 @@ static int alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
          init_pte(ptep, addr, next, phys, __prot);
+        if (pgprot_val(__prot) & PTE_CONT)
+            dm_meminfo_add(addr, (next - addr), CONT_PTE);
+        else
+            dm_meminfo_add(addr, (next - addr), PTE);
+
          ptep += pte_index(next) - pte_index(addr);
          phys += next - addr;
      } while (addr = next, addr != end);
@@ -266,6 +351,17 @@ static int init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
              (flags & NO_BLOCK_MAPPINGS) == 0) {
              pmd_set_huge(pmdp, phys, prot);
+            /*
+             * It is possible to have mappings allow cont mapping
+             * but disallow block mapping. For example,
+             * map_entry_trampoline().
+             * So we have to increase CONT_PMD and PMD size here
+             * to avoid double counting.
+             */
+            if (pgprot_val(prot) & PTE_CONT)
+                dm_meminfo_add(addr, (next - addr), CONT_PMD);
+            else
+                dm_meminfo_add(addr, (next - addr), PMD);

I don't understand the comment you're adding here. If somebody passes
NO_BLOCK_MAPPINGS then that also prevents contiguous entries except at
level 3.

The comment may be misleading. I meant if we have the accounting code for CONT_PMD in alloc_init_cont_pmd(), for example,

@@ -433,6 +433,11 @@ static int alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
if (ret)
goto out;

+ if (pgprot_val(prot) & PTE_CONT)
+ dm_meminfo_add(addr, (next - addr), CONT_PMD);

pmdp += pmd_index(next) - pmd_index(addr);
phys += next - addr;
} while (addr = next, addr != end);

If the described case happens, we actually miscount CONT_PMD. So I need to check whether it is CONT in init_pmd() instead. If the comment is confusing, I can just remove it.

It also doesn't look you handle the error case properly when the mapping
fails.

I don't quite get what fail do you mean? pmd_set_huge() doesn't fail. Or you meant hotplug fails? If so the hot unplug will decrease the counters, which is called in the error handling path.

-static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
+static void unmap_hotplug_pte_range(pte_t *ptep, unsigned long addr,
                      unsigned long end, bool free_mapped,
                      struct vmem_altmap *altmap)
{
-    pte_t *ptep, pte;
+    pte_t pte;
      do {
-        ptep = pte_offset_kernel(pmdp, addr);
          pte = __ptep_get(ptep);
          if (pte_none(pte))
              continue;
          WARN_ON(!pte_present(pte));
          __pte_clear(&init_mm, addr, ptep);
+        dm_meminfo_sub(addr, PAGE_SIZE, PTE);
          flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
          if (free_mapped)
              free_hotplug_page_range(pte_page(pte),
                          PAGE_SIZE, altmap);

Is the existing code correct for contiguous entries here? I'd have
thought that we'd need to make the range non-contiguous before knocking
out the TLB.

Thanks for pointing this out. I didn't pay too much attention to such details to the existing code. Actually I did notice hot unplug code doesn't handle contiguous mappings, so I added unmap_hotplug_cont_{pmd|pte}_range() in this patch in order to maintain the counters correctly. I'm not sure it is intended (or maybe just unnecessary) or just an overlook.

You are concerned this may result in misprogramming issue? TBH, I'm a little bit confused by the "misprogramming contiguous bit" described in ARM ARM. In this case, the TLB flush should remove the large contiguous TLB entry, so there should be no overlapping entries in TLB. Or this is still a problem because some entries have contiguous bit set, but some don't? But when we change the contiguous bit for a range of entries, there is always a moment that some have contiguous bit set, but some don't. My understanding is it is fine as long as there is no overlapping entries in TLB. Anyway I may misunderstand it.

Thanks,
Yang

Will