Re: [Xen-devel] [PATCH v2 01/11] kexec: introduce kexec_ops struct

From: Ian Campbell
Date: Fri Nov 23 2012 - 05:53:10 EST


On Fri, 2012-11-23 at 09:56 +0000, Jan Beulich wrote:
> >>> On 22.11.12 at 18:37, "H. Peter Anvin" <hpa@xxxxxxxxx> wrote:
> > I actually talked to Ian Jackson at LCE, and mentioned among other

That was me actually (this happens surprisingly often ;-)).

> > things the bogosity of requiring a PUD page for three-level paging in
> > Linux -- a bogosity which has spread from Xen into native. It's a page
> > wasted for no good reason, since it only contains 32 bytes worth of
> > data, *inherently*. Furthermore, contrary to popular belief, it is
> > *not* pa page table per se.
> >
> > Ian told me: "I didn't know we did that, and we shouldn't have to."
> > Here we have suffered this overhead for at least six years, ...
>
> Even the Xen kernel only needs the full page when running on a
> 64-bit hypervisor (now that we don't have a 32-bit hypervisor
> anymore, that of course basically means always).

I took an, admittedly very brief, look at it on the plane on the way
home and it seems like the requirement for a complete page on the
pvops-xen side comes from the !SHARED_KERNEL_PMD stuff (so still a Xen
related thing). This requires a struct page for the list_head it
contains (see pgd_list_add et al) rather than because of the use of the
page as a pgd as such.

> But yes, I too
> never liked this enforced over-allocation for native kernels (and
> was surprised that it was allowed in at all).

Completely agreed.

I did wonder if just doing something like:
- pgd = (pgd_t *)__get_free_page(PGALLOC_GFP);
+ if (SHARED_KERNEL_PMD)
+ pgd = some_appropriate_allocation_primitive(sizeof(*pgd));
+ else
+ pgd = (pgd_t *)__get_free_page(PGALLOC_GFP);

to pgd_alloc (+ the equivalent for the error path & free case, create
helper funcs as desired etc) would be sufficient to remove the over
allocation for the native case but haven't had time to properly
investigate.

Alternatively push the allocation down into paravirt_pgd_alloc to
taste :-/

Ian.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/