Re: [PATCH -V3 RESEND] x86, tdx, memory hotplug: Check whole hot-adding memory range for TDX
From: Dan Williams
Date: Tue Dec 03 2024 - 20:33:19 EST
[ drop Ying's Intel address ]
Huang Ying wrote:
> On systems with TDX (Trust Domain eXtensions) enabled, current kernel
> checks the TDX compatibility of the hot-added memory ranges through a
> memory hotplug notifier for each memory_block. If a memory range
> which isn't TDX compatible is hot-added, for example, some CXL memory,
> the command line as follows,
>
> $ echo 1 > /sys/devices/system/node/nodeX/memoryY/online
>
> will report something like,
>
> bash: echo: write error: Operation not permitted
>
> If pr_debug() is enabled, current kernel will show the error message
> like below in the kernel log,
>
> online_pages [mem 0xXXXXXXXXXX-0xXXXXXXXXXX] failed
>
> Both are too general to root cause the problem. This may confuse
> users. One solution is to print some error messages in the TDX memory
> hotplug notifier. However, kernel calls memory hotplug notifiers for
> each memory block, so this may lead to a large volume of messages in
> the kernel log if a large number of memory blocks are onlined with a
> script or automatically. For example, the typical size of memory
> block is 128MB on x86_64, when online 64GB CXL memory, 512 messages
> will be logged.
>
> Therefore, this patch checks the TDX compatibility of the whole
> hot-adding memory range through a newly added architecture specific
> function (arch_check_hotplug_memory_range()). If this patch rejects
> the memory hot-adding for TDX compatibility, it will output a kernel
> log message like below,
>
> virt/tdx: Reject hot-adding memory range: 0xXXXXXXXX-0xXXXXXXXX for TDX compatibility.
>
> The target use case is to support CXL memory on TDX enabled systems.
> If the CXL memory isn't compatible with TDX, the kernel will reject
> the whole CXL memory range. While the CXL memory can still be used
> via devdax interface.
>
> This also makes the original TDX memory hotplug notifier useless, so
> this patch deletes it.
>
> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
> Suggested-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx>
Drop my Reviewed-by as I now realize that my reading of the changelog
for commit abe8dbab8f9f "x86/virt/tdx: Use all system memory when
initializing TDX module as TDX memory", and the presence of the
"is_tdx_memory()" helper lead me astray. If the changelog had said "This
approach requires *but does not validate* all memblock memory
regions...", I might have been spared.
Until the new "convertible memory ranges" (CMR) enabling [1] is settled the
kernel just takes on faith that anything memblock thinks is memory is
TDX compatible.
So, the first thing to fix is rejecting non TDX compat memory at init.
Then teach is_tdx_memory() to actually know about convertible ranges.
Then add support for attempts to hot-add memory that might not be TDX
compatible.
Given Dave's comment about locking and the need to consult
is_tdx_memory() in more places, it seems reasonable to replace the
tdx_memlist linked list with an xarray with ranges recorded by
xa_store_range(). Then this implementation only needs rcu_read_lock()
for locking.