Re: [Xen-devel] [PATCH 5/7] xen/p2m: Add logic to revector a P2Mtree to use __va leafs.
From: Stefano Stabellini
Date: Fri Jul 27 2012 - 07:19:34 EST
On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote:
> During bootup Xen supplies us with a P2M array. It sticks
> it right after the ramdisk, as can be seen with a 128GB PV guest:
>
> (certain parts removed for clarity):
> xc_dom_build_image: called
> xc_dom_alloc_segment: kernel : 0xffffffff81000000 -> 0xffffffff81e43000 (pfn 0x1000 + 0xe43 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
> xc_dom_alloc_segment: ramdisk : 0xffffffff81e43000 -> 0xffffffff925c7000 (pfn 0x1e43 + 0x10784 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
> xc_dom_alloc_segment: phys2mach : 0xffffffff925c7000 -> 0xffffffffa25c7000 (pfn 0x125c7 + 0x10000 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000
> xc_dom_alloc_page : start info : 0xffffffffa25c7000 (pfn 0x225c7)
> xc_dom_alloc_page : xenstore : 0xffffffffa25c8000 (pfn 0x225c8)
> xc_dom_alloc_page : console : 0xffffffffa25c9000 (pfn 0x225c9)
> nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
> nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s)
> nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s)
> nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s)
> xc_dom_alloc_segment: page tables : 0xffffffffa25ca000 -> 0xffffffffa26e1000 (pfn 0x225ca + 0x117 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
> xc_dom_alloc_page : boot stack : 0xffffffffa26e1000 (pfn 0x226e1)
> xc_dom_build_image : virt_alloc_end : 0xffffffffa26e2000
> xc_dom_build_image : virt_pgtab_end : 0xffffffffa2800000
>
> So the physical memory and virtual (using __START_KERNEL_map addresses)
> layout looks as so:
>
> phys __ka
> /------------\ /-------------------\
> | 0 | empty | 0xffffffff80000000|
> | .. | | .. |
> | 16MB | <= kernel starts | 0xffffffff81000000|
> | .. | | |
> | 30MB | <= kernel ends => | 0xffffffff81e43000|
> | .. | & ramdisk starts | .. |
> | 293MB | <= ramdisk ends=> | 0xffffffff925c7000|
> | .. | & P2M starts | .. |
> | .. | | .. |
> | 549MB | <= P2M ends => | 0xffffffffa25c7000|
> | .. | start_info | 0xffffffffa25c7000|
> | .. | xenstore | 0xffffffffa25c8000|
> | .. | cosole | 0xffffffffa25c9000|
> | 549MB | <= page tables => | 0xffffffffa25ca000|
> | .. | | |
> | 550MB | <= PGT end => | 0xffffffffa26e1000|
> | .. | boot stack | |
> \------------/ \-------------------/
>
> As can be seen, the ramdisk, P2M and pagetables are taking
> a bit of __ka addresses space. Which is a problem since the
> MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits
> right in there! This results during bootup with the inability to
> load modules, with this error:
>
> ------------[ cut here ]------------
> WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370()
> Call Trace:
> [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0
> [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
> [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20
> [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370
> [<ffffffff81130c4d>] map_vm_area+0x2d/0x50
> [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250
> [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
> [<ffffffff810c6186>] ? load_module+0x66/0x19c0
> [<ffffffff8105cadc>] module_alloc+0x5c/0x60
> [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
> [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80
> [<ffffffff810c70c3>] load_module+0xfa3/0x19c0
> [<ffffffff812491f6>] ? security_file_permission+0x86/0x90
> [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220
> [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b
> ---[ end trace fd8f7704fdea0291 ]---
> vmalloc: allocation failure, allocated 16384 of 20480 bytes
> modprobe: page allocation failure: order:0, mode:0xd2
>
> Since the __va and __ka are 1:1 up to MODULES_VADDR and
> cleanup_highmap rids __ka of the ramdisk mapping, what
> we want to do is similar - get rid of the P2M in the __ka
> address space. There are two ways of fixing this:
>
> 1) All P2M lookups instead of using the __ka address would
> use the __va address. This means we can safely erase from
> __ka space the PMD pointers that point to the PFNs for
> P2M array and be OK.
> 2). Allocate a new array, copy the existing P2M into it,
> revector the P2M tree to use that, and return the old
> P2M to the memory allocate. This has the advantage that
> it sets the stage for using XEN_ELF_NOTE_INIT_P2M
> feature. That feature allows us to set the exact virtual
> address space we want for the P2M - and allows us to
> boot as initial domain on large machines.
>
> So we pick option 2).
1) looks like a decent option that requires less code.
Is the problem with 1) that we might want to access the P2M before we
have __va addresses ready?
> This patch only lays the groundwork in the P2M code. The patch
> that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> ---
> arch/x86/xen/p2m.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++
> arch/x86/xen/xen-ops.h | 1 +
> 2 files changed, 71 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 6a2bfa4..bbfd085 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -394,7 +394,77 @@ void __init xen_build_dynamic_phys_to_machine(void)
> * Xen provided pagetable). Do it later in xen_reserve_internals.
> */
> }
> +#ifdef CONFIG_X86_64
> +#include <linux/bootmem.h>
> +unsigned long __init xen_revector_p2m_tree(void)
> +{
> + unsigned long va_start;
> + unsigned long va_end;
> + unsigned long pfn;
> + unsigned long *mfn_list = NULL;
> + unsigned long size;
> +
> + va_start = xen_start_info->mfn_list;
> + /*We copy in increments of P2M_PER_PAGE * sizeof(unsigned long),
> + * so make sure it is rounded up to that */
> + size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> + va_end = va_start + size;
> +
> + /* If we were revectored already, don't do it again. */
> + if (va_start <= __START_KERNEL_map && va_start >= __PAGE_OFFSET)
> + return 0;
> +
> + mfn_list = alloc_bootmem_align(size, PAGE_SIZE);
> + if (!mfn_list) {
> + pr_warn("Could not allocate space for a new P2M tree!\n");
> + return xen_start_info->mfn_list;
> + }
> + /* Fill it out with INVALID_P2M_ENTRY value */
> + memset(mfn_list, 0xFF, size);
> +
> + for (pfn = 0; pfn < ALIGN(MAX_DOMAIN_PAGES, P2M_PER_PAGE); pfn += P2M_PER_PAGE) {
> + unsigned topidx = p2m_top_index(pfn);
> + unsigned mididx;
> + unsigned long *mid_p;
> +
> + if (!p2m_top[topidx])
> + continue;
> +
> + if (p2m_top[topidx] == p2m_mid_missing)
> + continue;
> +
> + mididx = p2m_mid_index(pfn);
> + mid_p = p2m_top[topidx][mididx];
> + if (!mid_p)
> + continue;
> + if ((mid_p == p2m_missing) || (mid_p == p2m_identity))
> + continue;
> +
> + if ((unsigned long)mid_p == INVALID_P2M_ENTRY)
> + continue;
> +
> + /* The old va. Rebase it on mfn_list */
> + if (mid_p >= (unsigned long *)va_start && mid_p <= (unsigned long *)va_end) {
> + unsigned long *new;
> +
> + new = &mfn_list[pfn];
> +
> + copy_page(new, mid_p);
> + p2m_top[topidx][mididx] = &mfn_list[pfn];
> + p2m_top_mfn_p[topidx][mididx] = virt_to_mfn(&mfn_list[pfn]);
>
> + }
> + /* This should be the leafs allocated for identity from _brk. */
> + }
> + return (unsigned long)mfn_list;
> +
> +}
> +#else
> +unsigned long __init xen_revector_p2m_tree(void)
> +{
> + return 0;
> +}
> +#endif
> unsigned long get_phys_to_machine(unsigned long pfn)
> {
> unsigned topidx, mididx, idx;
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 2230f57..bb5a810 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -45,6 +45,7 @@ void xen_hvm_init_shared_info(void);
> void xen_unplug_emulated_devices(void);
>
> void __init xen_build_dynamic_phys_to_machine(void);
> +unsigned long __init xen_revector_p2m_tree(void);
>
> void xen_init_irq_ops(void);
> void xen_setup_timer(int cpu);
> --
> 1.7.7.6
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/