Re: [Discuss] First steps for ASI (ASI is fast again)

From: David Hildenbrand
Date: Thu Oct 02 2025 - 03:45:53 EST

Next message: Juergen Gross: "Re: [PATCH 4/7] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum"
Previous message: Lukas Wunner: "Re: [PATCH v12 10/25] CXL/AER: Update PCI class code check to use FIELD_GET()"
Next in thread: Brendan Jackman: "Re: [Discuss] First steps for ASI (ASI is fast again)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I won't re-hash the details of the problem here (see [1]) but in short: file
pages aren't mapped into the physmap as seen from ASI's restricted address space.
This causes a major overhead when e.g. read()ing files. The solution we've
always envisaged (and which I very hastily tried to describe at LSF/MM/BPF this
year) was to simply stop read() etc from touching the physmap.

This is achieved in this prototype by a mechanism that I've called the "ephmap".
The ephmap is a special region of the kernel address space that is local to the
mm (much like the "proclocal" idea from 2019 [2]). Users of the ephmap API can
allocate a subregion of this, and provide pages that get mapped into their
subregion. These subregions are CPU-local. This means that it's cheap to tear
these mappings down, so they can be removed immediately after use (eph =
"ephemeral"), eliminating the need for complex/costly tracking data structures.

(You might notice the ephmap is extremely similar to kmap_local_page() - see the
commit that introduces it ("x86: mm: Introduce the ephmap") for discussion).

The ephmap can then be used for accessing file pages. It's also a generic
mechanism for accessing sensitive data, for example it could be used for
zeroing sensitive pages, or if necessary for copy-on-write of user pages.

At some point we discussed on how to make secretmem pages movable so we end up having less unmovable pages in the system.

Secretmem pages have their directmap removed once allocated, and restored once free (truncated from the page cache).

In order to migrate them we would have to temporarily map them, and we obviously don't want to temporarily map them into the directmap.

Maybe the ephmap could be user for that use case, too.

Another, similar use case, would be guest_memfd with a similar approach that secretmem took: removing the direct map. While guest_memfd does not support page migration yet, there are some prototypes that allow migrating pages for non-CoCo (IOW: ordinary) VMs.

Maybe using the ephmap could be used here too.

I guess an interesting question would be: which MM to use when we are migrating a page out of random context: memory offlining, page compaction, memory-failure, alloc_contig_pages, ...

[...]

Despite my title these numbers are kinda disappointing to be honest, it's not
where I wanted to be by now,

"ASI is faster again" :)

but it's still an order-of-magnitude better than
where we were for native FIO a few months ago. I believe almost all of this
remaining slowdown is due to unnecessary ASI exits, the key areas being:

- On every context_switch(). Google's internal implementation has fixed this (we
only really need it when switching mms).

- Whenever zeroing sensitive pages from the allocator. This could potentially be
solved with the ephmap but requires a bit of care to avoid opening CPU attack
windows.

- In copy-on-write for user pages. The ephmap could also help here but the
current implementation doesn't support it (it only allows one allocation at a
time per context).

But only the first point would actually be relevant for the FIO benchmark I assume, right?

So how confident are you that this is really going to be solvable. Or to ask from another angle: long-term how much slowdown do you expect and target?

--
Cheers

David / dhildenb

Next message: Juergen Gross: "Re: [PATCH 4/7] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum"
Previous message: Lukas Wunner: "Re: [PATCH v12 10/25] CXL/AER: Update PCI class code check to use FIELD_GET()"
Next in thread: Brendan Jackman: "Re: [Discuss] First steps for ASI (ASI is fast again)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]