RE: [RFC PATCH] mm/vmalloc: add vmalloc_decrypted() and vzalloc_decrypted()

From: Michael Kelley

Date: Fri Jun 12 2026 - 15:06:06 EST


From: Jason Gunthorpe <jgg@xxxxxxxx> Sent: Friday, June 12, 2026 11:18 AM
>
> On Fri, Jun 12, 2026 at 06:49:28PM +0100, Catalin Marinas wrote:
> > On Thu, Jun 11, 2026 at 08:49:54AM -0300, Jason Gunthorpe wrote:
> > > On Mon, Jun 08, 2026 at 04:37:02PM +0100, Catalin Marinas wrote:
> > > > > +/**
> > > > > + * vzalloc_decrypted - allocate zeroed virtually contiguous decrypted memory
> > > > > + * @size: allocation size
> > > > > + *
> > > > > + * Like vmalloc_decrypted(), but the memory is set to zero.
> > > > > + *
> > > > > + * Return: pointer to the allocated memory or %NULL on error
> > > > > + */
> > > > > +void *vzalloc_decrypted_noprof(unsigned long size)
> > > > > +{
> > > > > + void *addr;
> > > > > +
> > > > > + addr = __vmalloc_node_range_noprof(size, 1, VMALLOC_START, VMALLOC_END,
> > > > > + GFP_KERNEL,
> > > > > + pgprot_decrypted(PAGE_KERNEL),
> > > > > + VM_DECRYPTED, NUMA_NO_NODE,
> > > > > + __builtin_return_address(0));
> > > > > + if (addr)
> > > > > + memset(addr, 0, size);
> > > >
> > > > Talking to Suzuki, the small window between set_memory_decrypted() and
> > > > memset() potentially exposing stale data is safe, at least for Arm CCA
> > > > as the memory would be scrubbed (there are other places in the kernel
> > > > where we do something similar). I assume that's also the case for other
> > > > architectures, although not sure what pKVM does.
> > >
> > > It seems like a poor practice though, this should probably be
> > > re-organized to use __GFP_ZERO so things are ordered sensibly.
> >
> > __GFP_ZERO doesn't work if the intermediate set_memory_decrypted()
> > mangles the data (e.g. changes encryption keys) and it no longer reads
> > as zeros.
>
> I thought arches are either preserving the memory content or zeroing
> it, you are saying some arch leaves it as garbage? I'd argue that's an
> arch bug and they should clear it in their path.

AMD SEV-SNP leaves the memory contents as garbage after an encryption
or decryption state change. On the flip side, my understanding has been
that TDX zeroes the memory (or at least has an option to do so) after
such a state change, though a couple of AI chats say TDX also leaves
garbage. To be sure, I'd have to run an experiment to check in a TDX
guest on Hyper-V.

>
> Otherwise this sharp edge is not documented and we have many other
> places getting it wrong, eg system_heap_allocate() doesn't re-zero the
> memory after decrypting it.

In the Hyper-V code that uses set_memory_decrypted()/encrypted(),
there's always an explicit call to set the memory to zero afterwards.

Michael

>
> > > But what is the purpose of this? I guess some hyperv thing - but
> > > shouldn't we have a more structured way to "DMA map" things for the
> > > hypervisor instead of stuff like this? Why can't you use
> > > dma_alloc_coherent() which actually gives you an address that is
> > > sensible to pass to the hypervisor?
> >
> > IIRC netvsc_init_buf() uses vzalloc() to allocate some memory and that
> > buffer ends up in set_memory_decrypted() via vmbus_establish_gpadl().
> > arm64 does not support changing the decrypted/shared attributed of
> > vmalloc mappings and I don't think we should add it. Better to just
> > allocate it properly upfront.
>
> Sure
>
> > We might be able to use the DMA API but we won't get something like
> > vmalloc() - physically non-contiguous.
>
> The entry point is dma_alloc_noncontiguous() and you get a scatterlist
> back.
>
> > I think dma_alloc_noncontiguous() just falls back to
> > dma_direct_alloc_pages() in the absence of an iommu.
>
> In all cases you get a scatterlist with a CPU list and a DMA
> list. iommu gives a smaller DMA list.
>
> If you want a vmap then you can feed that CPU page list from the sgl
> into vmap().
>
> A dma_alloc_noncontiguous_vmap() helper would not be hard to make, and
> IMHO, would make alot more sense for hyperv to treat the memory access
> from the hypervisor as "DMA" instead of trying to re-invent the DMA
> API.. :\
>
> HCH was already saying we should not be allowing drivers to use
> set_memory_decrypted() at all, and hyperv is the biggest non-core user
> right now...
>
> Jason