Re: [RFC PATCH v2 03/23] x86/tdx: Enhance tdh_phymem_page_wbinvd_hkid() to invalidate huge pages

From: Yan Zhao

Date: Wed Nov 12 2025 - 21:37:29 EST

On Wed, Nov 12, 2025 at 06:29:11PM +0800, Huang, Kai wrote:
> On Wed, 2025-11-12 at 16:43 +0800, Yan Zhao wrote:
> > On Tue, Nov 11, 2025 at 05:23:30PM +0800, Huang, Kai wrote:
> > > On Thu, 2025-08-07 at 17:42 +0800, Yan Zhao wrote:
> > > > -u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page)
> > > > +u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct folio *folio,
> > > > + unsigned long start_idx, unsigned long npages)
> > > > {
> > > > + struct page *start = folio_page(folio, start_idx);
> > > > struct tdx_module_args args = {};
> > > > + u64 err;
> > > > +
> > > > + if (start_idx + npages > folio_nr_pages(folio))
> > > > + return TDX_OPERAND_INVALID;
> > > >
> > > > - args.rcx = mk_keyed_paddr(hkid, page);
> > > > + for (unsigned long i = 0; i < npages; i++) {
> > > > + args.rcx = mk_keyed_paddr(hkid, nth_page(start, i));
> > > >
> > >
> > > Just FYI: seems there's a series to remove nth_page() completely:
> > >
> > > https://lore.kernel.org/kvm/20250901150359.867252-1-david@xxxxxxxxxx/
> > Ah, thanks!
> > Then we can get rid of the "unsigned long i".
> >
> > - for (unsigned long i = 0; i < npages; i++) {
> > - args.rcx = mk_keyed_paddr(hkid, nth_page(start, i));
> > + while (npages--) {
> > + args.rcx = mk_keyed_paddr(hkid, start++);
> >
>
> You may want to be careful about doing '++' on a 'struct page *'. I am not
Before the removing nth_page() series, linux kernel defines nth_page() like
this:

#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
#define folio_page_idx(folio, p) (page_to_pfn(p) - folio_pfn(folio))
#else
#define nth_page(page,n) ((page) + (n))
#define folio_page_idx(folio, p) ((p) - &(folio)->page)
#endif

i.e., unless SPARSEMEM without SPARSEMEM_VMEMMAP, a folio's page is contiguous.

In David's removing nth_page() series, CONFIG_SPARSEMEM_VMEMMAP is auto-selected
along with CONFIG_SPARSEMEM in all architectures but sh.

David further ensures folio pages are continuous even on sh with the problematic
kernel configs (i.e., SPARSEMEM without SPARSEMEM_VMEMMAP) [1]:

: Currently, only a single architectures supports ARCH_HAS_GIGANTIC_PAGE
: but not SPARSEMEM_VMEMMAP: sh.
:
: Fortunately, the biggest hugetlb size sh supports is 64 MiB
: (HUGETLB_PAGE_SIZE_64MB) and the section size is at least 64 MiB
: (SECTION_SIZE_BITS == 26), so their use case is not degraded.
:
: As folios and memory sections are naturally aligned to their order-2 size
: in memory, consequently a single folio can no longer span multiple memory
: sections on these problematic kernel configs.

So it's safe to assume folio pages are continuous.

[1] https://lore.kernel.org/kvm/20250901150359.867252-12-david@xxxxxxxxxx/

> expert, but I saw below discussion on the thread [*] which led to the series
> to get rid of nth_page():
> > I wish we didn't have nth_page() at all. I really don't think it's a
> > valid operation. It's been around forever, but I think it was broken
> > as introduced, exactly because I don't think you can validly even have
> > allocations that cross section boundaries.
>
> Ordinary buddy allocations cannot exceed a memory section, but hugetlb and
> dax can with gigantic folios ... :(
>
> We had some weird bugs with that, because people keep forgetting that you
> cannot just use page++ unconditionally with such folios.

I found Linus's reply to David [2] :
: On Tue, 5 Aug 2025 at 16:37, David Hildenbrand <david@xxxxxxxxxx> wrote:
: >
: > Ordinary buddy allocations cannot exceed a memory section, but hugetlb and
: > dax can with gigantic folios ... :(
:
: Just turn that code off. Nobody sane cares.
:
: It sounds like people have bent over backwards to fix the insane case
: instead of saying "that's insane, let's not support it".
:
: And yes, "that's insane" is actually fairly recent. It's not that long
: ago that we made SPARSEMEM_VMEMMAP the mandatory option on x86-64. So
: it was all sane in a historical context, but it's not sane any more.
:
: But now it *is* the mandatory option both on x86 and arm64, so I
: really think it's time to get rid of pointless pain points.
:
: (I think powerpc still makes it an option to do sparsemem without
: vmemmap, but it *is* an option there too)

The removing nth_page() series then ensures hugetlb and dax are Ok like changes
in [3]. The series then iterates over all pages in a hugetlb folio by invoking
page++. e.g., [4][5].

[2] https://lore.kernel.org/all/CAHk-=wiYLcax-5THGofwk-SAWYZ1RsP08b+rozXOm0wZRCE9UQ@xxxxxxxxxxxxxx
[3] https://lore.kernel.org/kvm/20250901150359.867252-7-david@xxxxxxxxxx
[4] https://lore.kernel.org/kvm/20250901150359.867252-14-david@xxxxxxxxxx
[5] https://lore.kernel.org/kvm/20250901150359.867252-16-david@xxxxxxxxxx

> So, why not just get the actual page for each index within the loop?
We need to invoke folio_page() to get the actual page.

In [6], the new folio_page() implementation is

static inline struct page *folio_page(struct folio *folio, unsigned long n)
{
return &folio->page + n;
}

So, invoking folio_page() should be equal to page++ in our case.

[6] https://lore.kernel.org/kvm/20250901150359.867252-13-david@xxxxxxxxxx

> [*]:
> https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@xxxxxxxxxxxxxx/T/#m49ba78f5f630b27fa6d3d0737271f047af599c60