Re: [PATCH RFC] virtio-mem: support Confidential Computing (CoCo) environments
From: David Hildenbrand (Arm)
Date: Wed Apr 01 2026 - 15:35:43 EST
On 4/1/26 13:12, Marc-André Lureau wrote:
> In Confidential Computing (CoCo) environments such as Intel TDX or AMD
> SEV-SNP, hotplugged memory must be explicitly "accepted" (transitioned to
> a private/encrypted state) before it can be safely used by the guest.
> Conversely, before returning memory to the hypervisor during an unplug
> operation, it must be converted back to a shared/decrypted state.
>
> Attempting to handle memory acceptance automatically using generic
> architecture-level memory hotplug notifiers (e.g., MEM_GOING_ONLINE)
> is not viable for devices like virtio-mem:
>
> 1. Granularity Mismatch: virtio-mem can dynamically hot(un)plug memory
> at a subblock granularity (e.g., 2MB chunks within a 128MB memory
> block). Generic memory notifiers operate on the entire memory block.
> 2. Lifecycle Control: Memory must be explicitly accepted *before* it is
> handed to the core memory management subsystem (the buddy allocator),
> and it must be decrypted *before* being handed back to the device.
> 3. State Tracking (Offline -> Re-online): If memory is offlined and
> re-onlined without proper state transitions, TDX will panic on
> attempting to accept an already-accepted page (TDX_EPT_ENTRY_STATE_INCORRECT).
>
> To address this, this patch implements explicit CoCo memory conversions
> directly within the virtio-mem driver using set_memory_encrypted() and
> set_memory_decrypted():
>
> - During hotplug, explicitly accepts only the physically plugged subblocks
> right before fake-onlining them into the buddy allocator.
> - During unplug, memory is explicitly transitioned to the shared state
> before being handed back to the host. If the unplug operation fails,
> the driver attempts to re-accept (encrypt) the memory. If this
> re-acceptance fails, the memory is intentionally leaked to prevent
> confidentiality breaches or fatal hypervisor faults.
>
> This was discovered while testing virtio-mem resize with TDX guests.
> The associated QEMU virtio-mem + TDX patch series is under review at:
> https://patchew.org/QEMU/20260226140001.3622334-1-marcandre.lureau@xxxxxxxxxx/
>
> Note that QEMU punches the guest_memfd on KVM_HC_MAP_GPA_RANGE, when the
> guest memory is decrypted. There is thus no need to discard the guest_memfd
> in the virtio-mem device.
>
> This patch is a follow-up and supersedes "[PATCH 0/2] x86/tdx: Fix
> memory hotplug in TDX guests".
Hi!
Two things:
1) How do we know which state hotplugged memory is when we get it, and
which state hotunplugged memory must have before returning it?
I assume that other hypervisors (pKVM?) might have different demands in
the future. Should that be specified in the spec?
2) Should we indicate through a device feature flag that the driver
actually supports CoCo VMs?
>
> Assisted-by: Claude:claude-opus-4-6
> Reported-by: Chenyi Qiang <chenyi.qiang@xxxxxxxxx>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@xxxxxxxxxx>
> ---
> drivers/virtio/virtio_mem.c | 183 +++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 173 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 48051e9e98abf..518bc0aae5304 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -23,6 +23,8 @@
> #include <linux/log2.h>
> #include <linux/vmalloc.h>
> #include <linux/suspend.h>
> +#include <linux/set_memory.h>
> +#include <linux/cc_platform.h>
>
> #include <acpi/acpi_numa.h>
>
> @@ -864,19 +866,90 @@ static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
> return start >= vm->addr && start + size <= vm->addr + vm->region_size;
> }
>
> +/*
> + * In CoCo (TDX, SEV-SNP) environments, hotplugged memory must be explicitly
> + * accepted before use (private/encrypted), and converted to shared/decrypted
> + * before returning to the hypervisor on unplug.
> + */
> +static int virtio_mem_coco_set_encrypted(uint64_t addr, uint64_t size)
> +{
Where is the "acceptance" happening? Do we have to use the term
"acceptance" in this file at all when all we're doing is calling
set_memory_encrypted()?
> + return set_memory_encrypted((unsigned long)__va(addr), PFN_DOWN(size));
> +}
> +
> +static int virtio_mem_coco_set_decrypted(uint64_t addr, uint64_t size)
> +{
> + return set_memory_decrypted((unsigned long)__va(addr), PFN_DOWN(size));
> +}
> +
> +/*
> + * Convert all plugged subblocks of a memory block back to shared/decrypted.
> + * Used to undo set_encrypted on failure or cancel.
> + */
> +static void virtio_mem_sbm_coco_set_decrypted(struct virtio_mem *vm,
> + unsigned long mb_id)
> +{
> + int sb_id, count;
> +
> + if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
> + return;
> +
> + for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id += count) {
> + count = 1;
> + if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> + continue;
> + while (sb_id + count < vm->sbm.sbs_per_mb &&
> + virtio_mem_sbm_test_sb_plugged(vm, mb_id,
> + sb_id + count, 1))
> + count++;
> + virtio_mem_coco_set_decrypted(
> + virtio_mem_mb_id_to_phys(mb_id) +
> + sb_id * vm->sbm.sb_size,
> + (uint64_t)count * vm->sbm.sb_size);
> + }
> +}
> +
> static int virtio_mem_sbm_notify_going_online(struct virtio_mem *vm,
> unsigned long mb_id)
> {
> + int sb_id, count, rc;
> +
> switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
> case VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL:
> case VIRTIO_MEM_SBM_MB_OFFLINE:
> - return NOTIFY_OK;
> - default:
> break;
> + default:
> + dev_warn_ratelimited(&vm->vdev->dev,
> + "memory block onlining denied\n");
> + return NOTIFY_BAD;
> + }
> +
> + if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
> + return NOTIFY_OK;
> +
> + /*
> + * In CoCo environments, explicitly accept plugged subblocks before
> + * they get onlined and handed to the buddy. Only accept at subblock
> + * granularity -- unplugged subblocks have no backing in the secure
> + * page tables (SEPT) and accepting them would fail.
> + */
> + for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id += count) {
> + count = 1;
> + if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> + continue;
> + /* Coalesce consecutive plugged subblocks */
> + while (sb_id + count < vm->sbm.sbs_per_mb &&
> + virtio_mem_sbm_test_sb_plugged(vm, mb_id,
> + sb_id + count, 1))
> + count++;
> + rc = virtio_mem_coco_set_encrypted(
> + virtio_mem_mb_id_to_phys(mb_id) +
> + sb_id * vm->sbm.sb_size,
> + (uint64_t)count * vm->sbm.sb_size);
> + if (rc)
> + return NOTIFY_BAD;
> }
This looks extremely similar to the reverse of
virtio_mem_sbm_coco_set_decrypted() and should be similarly factored out
in a helper.
> - dev_warn_ratelimited(&vm->vdev->dev,
> - "memory block onlining denied\n");
> - return NOTIFY_BAD;
> +
> + return NOTIFY_OK;
> }
>
> static void virtio_mem_sbm_notify_offline(struct virtio_mem *vm,
> @@ -1055,8 +1128,16 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> break;
> }
> vm->hotplug_active = true;
> - if (vm->in_sbm)
> + if (vm->in_sbm) {
> rc = virtio_mem_sbm_notify_going_online(vm, id);
> + } else {
> + /*
> + * For BBM, accept the whole memory block range. Unlike
> + * SBM, the entire big block is always plugged.
> + */
> + if (virtio_mem_coco_set_encrypted(start, size))
> + rc = NOTIFY_BAD;
> + }
> break;
> case MEM_OFFLINE:
> if (vm->in_sbm)
> @@ -1106,6 +1187,14 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> case MEM_CANCEL_ONLINE:
> if (!vm->hotplug_active)
> break;
> + /*
> + * Undo CoCo acceptance done in MEM_GOING_ONLINE. Pages were
> + * made private but never onlined -- convert back to shared.
> + */
I think that comment can be mostly dropped. The code is pretty clear.
> + if (vm->in_sbm)
> + virtio_mem_sbm_coco_set_decrypted(vm, id);
> + else
> + virtio_mem_coco_set_decrypted(start, size);
> vm->hotplug_active = false;
> mutex_unlock(&vm->hotplug_mutex);
> break;
> @@ -1583,12 +1672,17 @@ static int virtio_mem_bbm_plug_bb(struct virtio_mem *vm, unsigned long bb_id)
> * memory block. Will fail if any subblock cannot get unplugged (instead of
> * skipping it).
> *
> + * If @coco_shared is true, convert each subblock range to shared/decrypted
> + * before unplugging. This is required for offline blocks that have a direct
> + * map but must not be used for blocks in PLUGGED state (no direct map).
> + *
> * Will not modify the state of the memory block.
> *
> * Note: can fail after some subblocks were unplugged.
> */
> static int virtio_mem_sbm_unplug_any_sb_raw(struct virtio_mem *vm,
> - unsigned long mb_id, uint64_t *nb_sb)
> + unsigned long mb_id, uint64_t *nb_sb,
> + bool coco_shared)
> {
> int sb_id, count;
> int rc;
> @@ -1609,6 +1703,15 @@ static int virtio_mem_sbm_unplug_any_sb_raw(struct virtio_mem *vm,
> sb_id--;
> }
>
> + if (coco_shared) {
> + rc = virtio_mem_coco_set_decrypted(
> + virtio_mem_mb_id_to_phys(mb_id) +
> + sb_id * vm->sbm.sb_size,
> + (uint64_t)count * vm->sbm.sb_size);
There has to be some way to make this readable.
Like local variables or different helpers.
Same applies to similar code below.
> + if (rc)
> + return rc;
> + }
> +
> rc = virtio_mem_sbm_unplug_sb(vm, mb_id, sb_id, count);
> if (rc)
> return rc;
> @@ -1630,7 +1733,11 @@ static int virtio_mem_sbm_unplug_mb(struct virtio_mem *vm, unsigned long mb_id)
> {
> uint64_t nb_sb = vm->sbm.sbs_per_mb;
>
> - return virtio_mem_sbm_unplug_any_sb_raw(vm, mb_id, &nb_sb);
> + /*
> + * Called for PLUGGED blocks (add_memory failed) -- no direct map
> + * exists, so CoCo conversion is not possible.
> + */
Isn't it rather "nothing to undo"?
> + return virtio_mem_sbm_unplug_any_sb_raw(vm, mb_id, &nb_sb, false);
> }
>
> /*
> @@ -1744,6 +1851,17 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
> if (old_state == VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL)
> continue;
>
> + /* Accept memory for CoCo before fake-onlining into buddy */
> + rc = virtio_mem_coco_set_encrypted(
> + virtio_mem_mb_id_to_phys(mb_id) +
> + sb_id * vm->sbm.sb_size,
> + (uint64_t)count * vm->sbm.sb_size);
> + if (rc) {
> + virtio_mem_sbm_unplug_sb(vm, mb_id, sb_id, count);
> + *nb_sb += count;
> + return rc;
> + }
> +
> /* fake-online the pages if the memory block is online */
> pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
> sb_id * vm->sbm.sb_size);
> @@ -1941,7 +2059,8 @@ static int virtio_mem_sbm_unplug_any_sb_offline(struct virtio_mem *vm,
> {
> int rc;
>
> - rc = virtio_mem_sbm_unplug_any_sb_raw(vm, mb_id, nb_sb);
> + /* Offline blocks have a direct map -- convert for CoCo before unplug */
> + rc = virtio_mem_sbm_unplug_any_sb_raw(vm, mb_id, nb_sb, true);
>
> /* some subblocks might have been unplugged even on failure */
> if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb))
> @@ -1989,10 +2108,32 @@ static int virtio_mem_sbm_unplug_sb_online(struct virtio_mem *vm,
> if (rc)
> return rc;
>
> + /* Convert private→shared for CoCo before handing back to device */
> + rc = virtio_mem_coco_set_decrypted(
> + virtio_mem_mb_id_to_phys(mb_id) +
> + sb_id * vm->sbm.sb_size,
> + (uint64_t)count * vm->sbm.sb_size);
God this is ugly.
--
Cheers,
David