Re: [Xen-devel] [PATCH 3/3] xen: eliminate scalability issues from initial mapping setup

From: Juergen Gross
Date: Fri Sep 05 2014 - 05:44:18 EST


On 09/05/2014 11:05 AM, Andrew Cooper wrote:
On 05/09/14 08:55, Juergen Gross wrote:
On 09/04/2014 04:43 PM, Andrew Cooper wrote:
On 04/09/14 15:31, Jan Beulich wrote:
On 04.09.14 at 15:02, <andrew.cooper3@xxxxxxxxxx> wrote:
On 04/09/14 13:59, David Vrabel wrote:
On 04/09/14 13:38, Juergen Gross wrote:
Direct Xen to place the initial P->M table outside of the initial
mapping, as otherwise the 1G (implementation) / 2G (theoretical)
restriction on the size of the initial mapping limits the amount
of memory a domain can be handed initially.
The three level p2m limits memory to 512 GiB on x86-64 but this patch
doesn't seem to address this limit and thus seems a bit useless to
me.
Any increase of the p2m beyond 3 levels will need to come with
substantial libxc changes first. 3 level p2ms are hard coded
throughout
all the PV build and migrate code.
No, there no such dependency - the kernel could use 4 levels at
any time (sacrificing being able to get migrated), making sure it
only exposes the 3 levels hanging off the fourth level (or not
exposing this information at all) to external entities making this
wrong assumption.

Jan


That would require that the PV kernel must start with a 3 level p2m and
fudge things afterwards.

I always thought the 3 level p2m is constructed by the kernel, not by
the tools.

It starts with the linear p2m list anchored at xen_start_info->mfn_list,
constructs the p2m tree and writes the p2m_top_mfn mfn to
HYPERVISOR_shared_info->arch.pfn_to_mfn_frame_list_list

See comment in the kernel source arch/x86/xen/p2m.c

So booting with a larger p2m list can be handled completely by the
kernel itself.

Ah yes - I remember now. All the toolstack does is create the linear
p2m. In which case building such a domain will be fine.



At a minimum, I would expect a patch to libxc to detect a 4 level PV
guest and fail with a meaningful error, rather than an obscure "m2p
doesn't match p2m for mfn/pfn X".

I'd rather fix it in a clean way.

I think the best way to do it would be an indicator in the p2m array
anchor, e.g. setting 1<<61 in pfn_to_mfn_frame_list_list. This will
result in an early error with old tools:
"Couldn't map p2m_frame_list_list"

No it wont. The is_mapped() macro in the toolstack is quite broken. It
stems from a lack of Design/API/ABI concerning things like the p2m. In
particular, INVALID_MFN is not an ABI constant, nor is any notion of
mapped vs unmapped.

That's not relevant here. map_frame_list_list() in xc_domain_save.c
reads pfn_to_mfn_frame_list_list and tries to map that mfn directly.
This will fail and result in above error message.

Its current implementation is a relic of 32bit days, and only checks bit
31. It also means that it is impossible to migrate a PV VM with pfns
above the 43bit limit; a restriction which is lifted by my migration v2
series. A lot of the other migration constructs are in a similar state,
which is why they are being deleted by the v2 series.

The clean way to fix this is to leave pfn_to_mfn_frame_list_list as
INVALID_MFN. Introduce two new fields beside it named p2m_levels and
p2m_root, which then caters for levels greater than 4 in a compatible
manner.

I don't mind doing it this way.


Juergen

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/