Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation
From: Jason Gunthorpe
Date: Thu Apr 03 2025 - 12:10:21 EST
On Thu, Apr 03, 2025 at 03:50:04PM +0000, Pratyush Yadav wrote:
> The patch currently has a limitation where it does not free any of the
> empty tables after a unpreserve operation. But Changyuan's patch also
> doesn't do it so at least it is not any worse off.
We do we even have unpreserve? Just discard the entire KHO operation
in a bulk.
> When working on this patch, I realized that kho_mem_deserialize() is
> currently _very_ slow. It takes over 2 seconds to make memblock
> reservations for 48 GiB of 0-order pages. I suppose this can later be
> optimized by teaching memblock_free_all() to skip preserved pages
> instead of making memblock reservations.
Yes, this was my prior point of not having actual data to know what
the actual hot spots are.. This saves a few ms on an operation that
takes over 2 seconds :)
> +typedef unsigned long khomem_desc_t;
This should be more like:
union {
void *table;
phys_addr_t table_phys;
};
Since we are not using the low bits right now and it is alot cheaper
to convert from va to phys only once during the final step. __va is
not exactly fast.
> +#define PTRS_PER_LEVEL (PAGE_SIZE / sizeof(unsigned long))
> +#define KHOMEM_L1_BITS (PAGE_SIZE * BITS_PER_BYTE)
> +#define KHOMEM_L1_MASK ((1 << ilog2(KHOMEM_L1_BITS)) - 1)
> +#define KHOMEM_L1_SHIFT (PAGE_SHIFT)
> +#define KHOMEM_L2_SHIFT (KHOMEM_L1_SHIFT + ilog2(KHOMEM_L1_BITS))
> +#define KHOMEM_L3_SHIFT (KHOMEM_L2_SHIFT + ilog2(PTRS_PER_LEVEL))
> +#define KHOMEM_L4_SHIFT (KHOMEM_L3_SHIFT + ilog2(PTRS_PER_LEVEL))
> +#define KHOMEM_PFN_MASK PAGE_MASK
This all works better if you just use GENMASK and FIELD_GET
> +static int __khomem_table_alloc(khomem_desc_t *desc)
> +{
> + if (khomem_desc_none(*desc)) {
Needs READ_ONCE
> +struct kho_mem_track {
> + /* Points to L4 KHOMEM descriptor, each order gets its own table. */
> + struct xarray orders;
> +};
I think it would be easy to add a 5th level and just use bits 63:57 as
a 6 bit order. Then you don't need all this stuff either.
> +int kho_preserve_folio(struct folio *folio)
> +{
> + unsigned long pfn = folio_pfn(folio);
> + unsigned int order = folio_order(folio);
> + int err;
> +
> + if (!kho_enable)
> + return -EOPNOTSUPP;
> +
> + down_read(&kho_out.tree_lock);
This lock still needs to go away
> +static void kho_mem_serialize(void)
> +{
> + struct kho_mem_track *tracker = &kho_mem_track;
> + khomem_desc_t *desc;
> + unsigned long order;
> +
> + xa_for_each(&tracker->orders, order, desc) {
> + if (WARN_ON(order >= NR_PAGE_ORDERS))
> + break;
> + kho_out.mem_tables[order] = *desc;
Missing the virt_to_phys?
> + nr_tables = min_t(unsigned int, len / sizeof(*tables), NR_PAGE_ORDERS);
> + for (order = 0; order < nr_tables; order++)
> + khomem_walk_preserved((khomem_desc_t *)&tables[order], order,
Missing phys_to_virt
Please dont' remove the KHOSER stuff, and do use it with proper
structs and types. It is part of keeping this stuff understandable.
Jason