Re: [PATCH v2 22/22] mm/secretmem: Use __GFP_UNMAPPED when available

From: Brendan Jackman

Date: Tue Mar 31 2026 - 10:42:17 EST


I skipped on posting most of the intervening AI reviews coz there's just
so much stuff and most of it is pretty boring, but this one is
interesting.

https://sashiko.dev/#/patchset/20260320-page_alloc-unmapped-v2-0-28bf1bd54f41%40google.com

On Fri Mar 20, 2026 at 6:23 PM UTC, Brendan Jackman wrote:
> This is the simplest possible way to adopt __GFP_UNMAPPED. Use it to
> allocate pages when it's available, meaning the
> set_direct_map_invalid_noflush() call is no longer needed.
>
> Signed-off-by: Brendan Jackman <jackmanb@xxxxxxxxxx>
> ---
> mm/secretmem.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 74 insertions(+), 13 deletions(-)
>
> diff --git a/mm/secretmem.c b/mm/secretmem.c
> index 5f57ac4720d32..9fef91237358a 100644
> --- a/mm/secretmem.c
> +++ b/mm/secretmem.c
> @@ -6,6 +6,7 @@
> */
>
> #include <linux/mm.h>
> +#include <linux/mermap.h>
> #include <linux/fs.h>
> #include <linux/swap.h>
> #include <linux/mount.h>
> @@ -47,13 +48,78 @@ bool secretmem_active(void)
> return !!atomic_read(&secretmem_users);
> }
>
> +/*
> + * If it's supported, allocate using __GFP_UNMAPPED. This lets the page
> + * allocator amortize TLB flushes and avoids direct map fragmentation.
> + */
> +#ifdef CONFIG_PAGE_ALLOC_UNMAPPED
> +static inline struct folio *secretmem_folio_alloc(gfp_t gfp, unsigned int order)
> +{
> + int err;
> +
> + /* Required for __GFP_UNMAPPED|__GFP_ZERO. */
> + err = mermap_mm_prepare(current->mm);
> + if (err)
> + return ERR_PTR(err);

Sashiko:
> In remote access paths such as process_vm_readv or io_uring worker threads,
> current->mm might be NULL or point to a different address space than the
> faulted VMA. Should this use vmf->vma->vm_mm instead?

This can't happen, right? I think the assumption is good; doing a
secretmem fault in a kthread or any process that doesn't have the file
mmap()'d would be a bug? But this definitely feels like something I
could be wrong about.

AI slop to empirically check the two examples mentioned by the AI fail
early (slop for slop, it's slop all the way down...):

- process_vm_readv(): https://paste.debian.net/hidden/5625ef2e
- io_uring: https://paste.debian.net/hidden/e7763ad2

[...]

> +static inline void secretmem_folio_restore(struct folio *folio)
> +{
> + set_direct_map_default_noflush(folio_page(folio, 0));
> +}

I defined a lovely helper here but neglected to actually call it.
Sashiko says "this isn't a bug" but that's wrong, calling
set_direct_map_default_noflush() before freeing a __GFP_UNMAPPED page is
not OK.

(Should it be? I think no. We _could_ define "default" so that it checks
the pageblock flags and does the right thing for you. But then we'd be
baking in the assumption that the page allocator can efficiently look
that up. This would be rather tricky if we decided we need to mix mapped
an unmapped pages in the same block).

And this was hiding the more important bug which was that I forgot to do
the mermap dance for zeroing the page.

> +static inline void secretmem_folio_flush(struct folio *folio)
> +{
> + unsigned long addr = (unsigned long)folio_address(folio);
> +
> + flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> +}
> +#endif
> +
> static vm_fault_t secretmem_fault(struct vm_fault *vmf)
> {
> struct address_space *mapping = vmf->vma->vm_file->f_mapping;
> struct inode *inode = file_inode(vmf->vma->vm_file);
> pgoff_t offset = vmf->pgoff;
> gfp_t gfp = vmf->gfp_mask;
> - unsigned long addr;
> struct folio *folio;
> vm_fault_t ret;
> int err;
> @@ -66,16 +132,9 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
> retry:
> folio = filemap_lock_folio(mapping, offset);
> if (IS_ERR(folio)) {
> - folio = folio_alloc(gfp | __GFP_ZERO, 0);
> - if (!folio) {
> - ret = VM_FAULT_OOM;
> - goto out;
> - }
> -
> - err = set_direct_map_invalid_noflush(folio_page(folio, 0));
> - if (err) {
> - folio_put(folio);
> - ret = vmf_error(err);
> + folio = secretmem_folio_alloc(gfp | __GFP_ZERO, 0);
> + if (IS_ERR_OR_NULL(folio)) {
> + ret = folio ? vmf_error(PTR_ERR(folio)) : VM_FAULT_OOM;
> goto out;
> }
>

> err = filemap_add_folio(mapping, folio, offset, gfp);
> if (unlikely(err)) {
> /*
> * If a split of large page was required, it
> * already happened when we marked the page invalid
> * which guarantees that this call won't fail
> */
> set_direct_map_default_noflush(folio_page(folio, 0));
> folio_put(folio);
> if (err == -EEXIST)
> goto retry;

This failure path leaks mermap TLB entries on ther CPUs. So if an
attacker can trigger this path, and cause another CPU to populate its
TLB for the mermap region, and then cause a victim to allocate the page
they just freed, they can use those entries for a sidechannel attack.

I did call out in the cover letter that there's some jank with the "if you use
__GFP_UNMAPPED|__GFP_ZERO then you need to think about the TLB" thing.
But, this shows it's worse than I thought. I need to think about how to
mitigate this.