[PATCH RFC v3 00/35] Add support for arm64 MTE dynamic tag storage reuse
From: Alexandru Elisei
Date: Thu Jan 25 2024 - 11:43:10 EST
The series is based on v6.8-rc1 and can be cloned with:
$ git clone https://gitlab.arm.com/linux-arm/linux-ae.git \
-b arm-mte-dynamic-carveout-rfc-v3
Changelog
=========
The changes from the previous version [1] are extensive, so I'll list them
first. Only the major changes are below, individual patches will have their
own changelog.
I would like to point out that patch #31 ("khugepaged: arm64: Don't
collapse MTE enabled VMAs") might be controversial. Please have a look.
Changes since rfc v2 [1]:
- Patches #5 ("mm: cma: Don't append newline when generating CMA area
name") and #16 ("KVM: arm64: Don't deny VM_PFNMAP VMAs when kvm_has_mte()")
are new and they are fixes. I think they can be merged independently of the
rest of the series.
- Tag storage now uses the CMA API to allocate and free tag storage pages
(David Hildenbrand).
- Tag storage is now described as subnode of 'reserved-memory' (Rob
Herring).
- KVM now has support for dynamic tag storage reuse, added in patches #32
("KVM: arm64: mte: Reserve tag storage for VMs with MTE") and #33 ("KVM:
arm64: mte: Introduce VM_MTE_KVM VMA flag").
- Reserving tag storage when a tagged page is allocated is now a best
effort approach instead of being mandatory. If tag storage cannot be
reserved, the page is marked as protnone and tag storage is reserved when
the fault is taken on the next userspace access to the address.
- ptrace support for pages without tag storage has been added, implemented
in patch #30 ("arm64: mte: ptrace: Handle pages with missing tag storage").
- The following patches have been dropped: #4 (" mm: migrate/mempolicy: Add hook
to modify migration target gfp"), #5 ("mm: page_alloc: Add an arch hook to allow
prep_new_page() to fail") because reserving tag storage is now best effort,
and to make the series shorter, in the case of patch #4.
- Also dropped patch #13 ("arm64: mte: Make tag storage depend on
ARCH_KEEP_MEMBLOCK") and added a BUILD_BUG_ON() instead (David
Hildenbrand).
- Dropped patch #15 ("arm64: mte: Check that tag storage blocks are in the
same zone") because it's not needed anymore,
cma_init_reserved_areas->cma_activate_area() already does that (David
Hildenbrand).
- Moved patches #1 ("arm64: mte: Rework naming for tag manipulation functions")
and #2 ("arm64: mte: Rename __GFP_ZEROTAGS to __GFP_TAGGED") after the changes
to the common code and before tag storage is discovered.
- Patch #12 ("arm64: mte: Add tag storage pages to the MIGRATE_CMA
migratetype") was replaced with patch #20 ("arm64: mte: Add tag storage
memory to CMA") (David Hildenbrand).
- Split patch #19 ("mm: mprotect: Introduce PAGE_FAULT_ON_ACCESS for
mprotect(PROT_MTE)") into an arch independent part (patch #13, "mm: memory:
Introduce fault-on-access mechanism for pages") and into an arm64 patch (patch
#26, "arm64: mte: Use fault-on-access to reserve missing tag storage"). The
arm64 code is much smaller because of this (David Hildenbrand).
[1] https://lore.kernel.org/linux-arm-kernel/20231119165721.9849-1-alexandru.elisei@xxxxxxx/
Introduction
============
Memory Tagging Extension (MTE) is implemented currently to have a static
carve-out of the DRAM to store the allocation tags (a.k.a. memory colour).
This is what we call the tag storage. Each 16 bytes have 4 bits of tags, so
this means 1/32 of the DRAM, roughly 3% used for the tag storage. This is
done transparently by the hardware/interconnect (with firmware setup) and
normally hidden from the OS. So a checked memory access to location X
generates a tag fetch from location Y in the carve-out and this tag is
compared with the bits 59:56 in the pointer. The correspondence from X to Y
is linear (subject to a minimum block size to deal with some address
interleaving). The software doesn't need to know about this correspondence
as we have specific instructions like STG/LDG to location X that lead to a
tag store/load to Y.
Not all memory used by applications is tagged (mmap(PROT_MTE)). For
example, some large allocations may not use PROT_MTE at all or only for the
first and last page since initialising the tags takes time. And executable
memory is never tagged. The side-effect is that of thie 3% of DRAM, only
part of it, say 1%, is effectively used.
The series aims to take that unused tag storage and release it to the page
allocator for normal data usage.
The first complication is that a PROT_MTE page allocation at address X will
need to reserve the tag storage page at location Y (and migrate any data in
that page if it is in use).
To make things more complicated, pages in the tag storage/carve-out range
cannot use PROT_MTE themselves on current hardware, so this adds the second
complication - a heterogeneous memory layout. The kernel needs to know
where to allocate a PROT_MTE page from or migrate a current page if it
becomes PROT_MTE (mprotect()) and the range it is in does not support
tagging.
Some other complications are arm64-specific like cache coherency between
tags and data accesses. There is a draft architecture spec which will be
released soon, detailing how the hardware behaves.
All of this will be entirely transparent to userspace. As with the current
kernel (without this dynamic tag storage), a user only needs to ask for
PROT_MTE mappings to get tagged pages.
Implementation
==============
MTE tag storage reuse is accomplished with the following changes to the
Linux kernel:
1. The tag storage memory is exposed to the memory allocator as
MIGRATE_CMA. The arm64 uses the newly added function cma_alloc_range() to
reserve tag storage when the associated page is allocated as tagged.
There is a limitation to this approach: all MIGRATE_CMA memory cannot be
used for tagged allocations, even if not all of it is tag storage.
2. mprotect(PROT_MTE) is implemented by adding a fault-on-access mechanism
for existing pages. When a page is next accessed, a fault is taken and the
corresponding tag storage is reserved.
3. When the code tries to copy tags to a page (when swapping in a newly
allocated page, or during migration/THP collapse) which doesn't have the
tag storage reserved, the tags are copied to an xarray and restored when
tag storage is reserved for the destination page.
4. KVM allows VMAs without MTE enabled to represent the memory of a virtual
machine with MTE enabled. Even though the host treats the pages that
represent guest memory as untagged, they have tags associated with them,
which are used by the guest. To make dynamic tag storage work with KVM, two
changes were necessary: try to reserve tag storage when a guest accesses an
address the first time, and if not possible, migrate the page to replace it
with a page with tag storage reserved; and a new VMA flag, VM_MTE_KVM, was
added so the page allocator will not use tag storage pages (which cannot be
tagged) for VM memory. The second change is a performance optimization.
Testing
=======
To enable MTE dynamic tag storage:
- CONFIG_ARM64_MTE_TAG_STORAGE=y
- system_supports_mte() returns true
- kasan_hw_tags_enabled() returns false
- correct DTB node. For an example that works with FVP, have a look at
patch #35 ("HACK! Add fake tag storage to fvp-base-revc.dts")
Check dmesg for the message "MTE tag storage region management enabled".
Alexandru Elisei (35):
mm: page_alloc: Add gfp_flags parameter to arch_alloc_page()
mm: page_alloc: Add an arch hook early in free_pages_prepare()
mm: page_alloc: Add an arch hook to filter MIGRATE_CMA allocations
mm: page_alloc: Partially revert "mm: page_alloc: remove stale CMA
guard code"
mm: cma: Don't append newline when generating CMA area name
mm: cma: Make CMA_ALLOC_SUCCESS/FAIL count the number of pages
mm: cma: Add CMA_RELEASE_{SUCCESS,FAIL} events
mm: cma: Introduce cma_alloc_range()
mm: cma: Introduce cma_remove_mem()
mm: cma: Fast track allocating memory when the pages are free
mm: Allow an arch to hook into folio allocation when VMA is known
mm: Call arch_swap_prepare_to_restore() before arch_swap_restore()
mm: memory: Introduce fault-on-access mechanism for pages
of: fdt: Return the region size in of_flat_dt_translate_address()
of: fdt: Add of_flat_read_u32()
KVM: arm64: Don't deny VM_PFNMAP VMAs when kvm_has_mte()
arm64: mte: Rework naming for tag manipulation functions
arm64: mte: Rename __GFP_ZEROTAGS to __GFP_TAGGED
arm64: mte: Discover tag storage memory
arm64: mte: Add tag storage memory to CMA
arm64: mte: Disable dynamic tag storage management if HW KASAN is
enabled
arm64: mte: Enable tag storage if CMA areas have been activated
arm64: mte: Try to reserve tag storage in arch_alloc_page()
arm64: mte: Perform CMOs for tag blocks
arm64: mte: Reserve tag block for the zero page
arm64: mte: Use fault-on-access to reserve missing tag storage
arm64: mte: Handle tag storage pages mapped in an MTE VMA
arm64: mte: swap: Handle tag restoring when missing tag storage
arm64: mte: copypage: Handle tag restoring when missing tag storage
arm64: mte: ptrace: Handle pages with missing tag storage
khugepaged: arm64: Don't collapse MTE enabled VMAs
KVM: arm64: mte: Reserve tag storage for virtual machines with MTE
KVM: arm64: mte: Introduce VM_MTE_KVM VMA flag
arm64: mte: Enable dynamic tag storage management
HACK! Add fake tag storage to fvp-base-revc.dts
.../reserved-memory/arm,mte-tag-storage.yaml | 78 +++
arch/arm64/Kconfig | 14 +
arch/arm64/boot/dts/arm/fvp-base-revc.dts | 42 +-
arch/arm64/include/asm/assembler.h | 10 +
arch/arm64/include/asm/mte-def.h | 16 +-
arch/arm64/include/asm/mte.h | 43 +-
arch/arm64/include/asm/mte_tag_storage.h | 83 +++
arch/arm64/include/asm/page.h | 10 +-
arch/arm64/include/asm/pgtable-prot.h | 2 +
arch/arm64/include/asm/pgtable.h | 93 ++-
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/elfcore.c | 14 +-
arch/arm64/kernel/hibernate.c | 46 +-
arch/arm64/kernel/mte.c | 37 +-
arch/arm64/kernel/mte_tag_storage.c | 643 ++++++++++++++++++
arch/arm64/kvm/mmu.c | 128 +++-
arch/arm64/lib/mte.S | 34 +-
arch/arm64/mm/copypage.c | 56 ++
arch/arm64/mm/fault.c | 133 +++-
arch/arm64/mm/init.c | 3 +
arch/arm64/mm/mteswap.c | 160 ++++-
arch/s390/include/asm/page.h | 2 +-
arch/s390/mm/page-states.c | 2 +-
arch/sh/kernel/cpu/sh2/probe.c | 2 +-
drivers/of/fdt.c | 21 +
drivers/of/fdt_address.c | 12 +-
drivers/tty/serial/earlycon.c | 2 +-
fs/proc/page.c | 1 +
include/linux/cma.h | 3 +
include/linux/gfp.h | 2 +-
include/linux/gfp_types.h | 6 +-
include/linux/huge_mm.h | 4 +-
include/linux/kernel-page-flags.h | 1 +
include/linux/khugepaged.h | 5 +
include/linux/memcontrol.h | 2 +
include/linux/migrate.h | 8 +-
include/linux/migrate_mode.h | 1 +
include/linux/mm.h | 2 +
include/linux/of_fdt.h | 4 +-
include/linux/page-flags.h | 16 +-
include/linux/pgtable.h | 72 +-
include/linux/vm_event_item.h | 2 +
include/trace/events/cma.h | 59 ++
include/trace/events/mmflags.h | 5 +-
mm/Kconfig | 8 +
mm/cma.c | 166 ++++-
mm/huge_memory.c | 37 +-
mm/internal.h | 6 -
mm/khugepaged.c | 4 +
mm/memory-failure.c | 8 +-
mm/memory.c | 55 +-
mm/mempolicy.c | 1 +
mm/page_alloc.c | 46 +-
mm/shmem.c | 14 +-
mm/swapfile.c | 5 +
mm/vmstat.c | 2 +
56 files changed, 2016 insertions(+), 216 deletions(-)
create mode 100644 Documentation/devicetree/bindings/reserved-memory/arm,mte-tag-storage.yaml
create mode 100644 arch/arm64/include/asm/mte_tag_storage.h
create mode 100644 arch/arm64/kernel/mte_tag_storage.c
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
--
2.43.0