Re: [PATCHv7 08/16] x86/tdx: Account shared memory

From: Dave Hansen
Date: Fri Feb 23 2024 - 14:08:30 EST


On 2/12/24 02:44, Kirill A. Shutemov wrote:
> The kernel will convert all shared memory back to private during kexec.
> The direct mapping page tables will provide information on which memory
> is shared.
>
> It is extremely important to convert all shared memory. If a page is
> missed, it will cause the second kernel to crash when it accesses it.
>
> Keep track of the number of shared pages. This will allow for
> cross-checking against the shared information in the direct mapping and
> reporting if the shared bit is lost.
>
> Include a debugfs interface that allows for the check to be performed at
> any point.

When I read this, I thought you were going to do some automatic
checking. Could you make it more clear here that it's 100% up to the
user to figure out if the numbers in debugfs match and whether there's a
problem? This would also be a great place to mention that the whole
thing is racy.

> +static atomic_long_t nr_shared;
> +
> +static inline bool pte_decrypted(pte_t pte)
> +{
> + return cc_mkdec(pte_val(pte)) == pte_val(pte);
> +}

Name this pte_is_decrypted(), please.

> /* Called from __tdx_hypercall() for unrecoverable failure */
> noinstr void __noreturn __tdx_hypercall_failed(void)
> {
> @@ -821,6 +829,11 @@ static int tdx_enc_status_change_finish(unsigned long vaddr, int numpages,
> if (!enc && !tdx_enc_status_changed(vaddr, numpages, enc))
> return -EIO;
>
> + if (enc)
> + atomic_long_sub(numpages, &nr_shared);
> + else
> + atomic_long_add(numpages, &nr_shared);
> +
> return 0;
> }
>
> @@ -896,3 +909,59 @@ void __init tdx_early_init(void)
>
> pr_info("Guest detected\n");
> }
> +
> +#ifdef CONFIG_DEBUG_FS
> +static int tdx_shared_memory_show(struct seq_file *m, void *p)
> +{
> + unsigned long addr, end;
> + unsigned long found = 0;
> +
> + addr = PAGE_OFFSET;
> + end = PAGE_OFFSET + get_max_mapped();
> +
> + while (addr < end) {
> + unsigned long size;
> + unsigned int level;
> + pte_t *pte;
> +
> + pte = lookup_address(addr, &level);
> + size = page_level_size(level);
> +
> + if (pte && pte_decrypted(*pte))
> + found += size / PAGE_SIZE;
> +
> + addr += size;
> +
> + cond_resched();
> + }

This is totally racy, right? Nothing prevents the PTE from
flip-flopping all over the place.

> + seq_printf(m, "Number of shared pages in kernel page tables: %16lu\n",
> + found);
> + seq_printf(m, "Number of pages accounted as shared: %16ld\n",
> + atomic_long_read(&nr_shared));
> + return 0;
> +}

Ditto with 'nr_shared'. There's nothing to say that the page table walk
has anything to do with 'nr_shared' by the time we get down here.

That's not _fatal_ for a debug interface, but the pitfalls need to at
least be discussed. Better yet would be to make sure this and the cpa
code don't stomp on each other.