Re: p2m stuff and crash tool

From: Juergen Gross
Date: Tue Feb 16 2016 - 07:55:41 EST


Hi Daniel,

On 16/02/16 12:35, Daniel Kiper wrote:
> Hey Juergen,
>
> As I saw you are strongly playing with p2m stuff, so,
> I hope that you can enlighten me a bit in that area.

Yes, the p2m stuff is always fun. :-)

> OVM, Oracle product, uses as dom0 kernel Linux 3.8.13
> (yep, I know this is very ancient stuff) with a lot of
> backports. Among them there is commit 2c185687ab016954557aac80074f5d7f7f5d275c
> (x86/xen: delay construction of mfn_list_list). After
> an investigation I discovered that it breaks crash tool.
> It fails with following message:
>
> crash: read error: kernel virtual address: ffff88027ce0b700 type: "current_task (per_cpu)"
> crash: read error: kernel virtual address: ffff88027ce2b700 type: "current_task (per_cpu)"
> crash: read error: kernel virtual address: ffff88027ce4b700 type: "current_task (per_cpu)"
> crash: read error: kernel virtual address: ffff88027ce6b700 type: "current_task (per_cpu)"
> crash: read error: kernel virtual address: ffff88027ce10c64 type: "tss_struct ist array"
>
> Addresses and symbols depends on a given build.
>
> The problem is that xen_max_p2m_pfn in xen_build_mfn_list_list()
> is equal to xen_start_info->nr_pages. This means that memory
> which is above due to some remapping/relocation (usually it is
> small fraction) is not mapped via p2m_top_mfn and p2m_top_mfn_p.
> I should mention here that Xen is started with e.g. dom0_mem=1g,max:1g.
> If I remove max argument then crash works because xen_max_p2m_pfn
> is greater than xen_start_info->nr_pages. Additionally, the issue
> could be fixed by replacing xen_max_p2m_pfn in xen_build_mfn_list_list()
> with max_pfn.
>
> After that I decided to take a look at Linux kernel upstream. I saw
> that xen_max_p2m_pfn in xen_build_mfn_list_list() is equal to "the
> end of last usable machine memory region available for a given
> dom0_mem argument + something", e.g.
>
> For dom0_mem=1g,max:1g:
>
> (XEN) Xen-e820 RAM map:
> (XEN) 0000000000000000 - 000000000009fc00 (usable)
> (XEN) 000000000009fc00 - 00000000000a0000 (reserved)
> (XEN) 00000000000f0000 - 0000000000100000 (reserved)
> (XEN) 0000000000100000 - 000000007ffdf000 (usable) <--- HERE
> (XEN) 000000007ffdf000 - 0000000080000000 (reserved)
> (XEN) 00000000b0000000 - 00000000c0000000 (reserved)
> (XEN) 00000000feffc000 - 00000000ff000000 (reserved)
> (XEN) 00000000fffc0000 - 0000000100000000 (reserved)
> (XEN) 0000000100000000 - 0000000180000000 (usable)
>
> Hence xen_max_p2m_pfn == 0x80000
>
> Later I reviewed most of your p2m related commits and I realized
> that you played whack-a-mole game with p2m bugs. Sadly, I was not
> able to identify exactly one (or more) commit which would fix the
> same issue (well, there are some which fixes similar stuff but not
> the same one described above). So, if you explain to me why
> xen_max_p2m_pfn is set to that value and does not e.g. max_pfn then
> it will be much easier for me to write proper fix and maybe fix
> the same issue in upstream kernel if it is needed (well, crash
> tool does not work with new p2m layout so first of all I must fix it;
> I hope that you will help me to that sooner or later).

The reason for setting xen_max_p2m_pfn to nr_pages initially is it's
usage in __pfn_to_mfn(): this must work with the initial p2m list
supplied by the hypervisor which just has only nr_pages entries.

Later it is updated to the number of entries the linear p2m list is
able to hold. This size has to include possible hotplugged memory
in prder to be able to make use of that memory later (remember: the
p2m list's size is limited by the virtual space allocated for it via
xen_vmalloc_p2m_tree()).

> Additionally, during that work I realized that p2m_top (xen_p2m_addr
> in latest Linux kernel) and p2m_top_mfn differs. As I saw p2m_top
> represents all stuff (memory, missing, identity, etc.) found in PV
> guest address space. However, p2m_top_mfn is just limited to memory
> and missing things. Taking into account that p2m_top_mfn is used just
> for migration and crash tool it looks that it is sufficient. Am I correct?
> Am I not missing any detail?

Basically p2m_top and p2m_top_mfn hold the same information. p2m_top has
just some special mappings for identity pages: they translate to
"invalid" mfns just as in p2m_top_mfn, but via dedicated pages which are
identified by comparing their addresses (or pfns) in order to detect
the identity pages.

As you thought: this distinction isn't necessary for p2m_top_mfn, so it
can be omitted there.

>
> Daniel
>
> PS I am sending this to wider forum because I think that it
> is worth spreading knowledge even if it is not strictly
> related to latest Xen or Linux kernel developments.

OTOH: what was hard to write should be hard to read. ;-)

Feel free to ask further questions.

Juergen