Re: [Xen-devel] [PATCH v3] xen/balloon: Mark unallocated host memory as UNUSABLE

From: Igor Druzhinin
Date: Mon Nov 26 2018 - 12:11:41 EST


On 26/11/2018 16:25, Boris Ostrovsky wrote:
> On 11/25/18 8:00 PM, Igor Druzhinin wrote:
>> On 20/12/2017 14:05, Boris Ostrovsky wrote:
>>> Commit f5775e0b6116 ("x86/xen: discard RAM regions above the maximum
>>> reservation") left host memory not assigned to dom0 as available for
>>> memory hotplug.
>>>
>>> Unfortunately this also meant that those regions could be used by
>>> others. Specifically, commit fa564ad96366 ("x86/PCI: Enable a 64bit BAR
>>> on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)") may try to map those
>>> addresses as MMIO.
>>>
>>> To prevent this mark unallocated host memory as E820_TYPE_UNUSABLE (thus
>>> effectively reverting f5775e0b6116) and keep track of that region as
>>> a hostmem resource that can be used for the hotplug.
>>>
>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
>> This commit breaks Xen balloon memory hotplug for us in Dom0 with
>> "hoplug_unpopulated" set to 1. The issue is that the common kernel
>> memory onlining procedures require "System RAM" resource to be 1-st
>> level. That means by inserting it under "Unusable memory" as the commit
>> above does (intentionally or not) we make it 2-nd level and break memory
>> onlining.
>
> What do you mean by 1st and 2nd level?
>

I mean the level of a resource in IOMEM tree (the one that's printed
from /proc/iomem). 1-st level means its parent is root and so on.

>>
>> There are multiple ways to fix it depending on what was the intention of
>> original commit and what exactly it tried to workaround. It seems it
>> does several things at once:
>> 1) Marks non-Dom0 host memory "Unusable memory" in resource tree.
>> 2) Keeps track of all the areas safe for hotplug in Dom0
>> 3) Changes allocation algorithms itself in balloon driver to use those areas
>
> Pretty much. (3) is true in the sense that memory is first allocated
> from hostmem_resource (which is non-dom0 RAM).
>
>>
>> Are all the things above necessary to cover the issue in fa564ad96366
>> ("x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f,
>> 60-7f)")?
>
> Not anymore, as far as that particular commit is concerned, but that's
> because of 03a551734 ("x86/PCI: Move and shrink AMD 64-bit window to
> avoid conflict") which was introduced after balloon patch. IIRC there
> were some issues with fa564ad96366 unrelated to balloon.
>

If it's not a problem anymore IIUC, can we revert the change as it still
breaks "hotplug_unpopulated=1" for the reasons I described above?

>
>>
>> Can we remove "Unusable memory" resources as soon as we finished
>> booting? Is removing on-demand is preferable over "shoot them all" in
>> that case?
>
> The concern is that in principle nothing prevents someone else to do
> exact same thing fa564ad96366 did, which is grab something from right
> above end of RAM as the kernel sees it. And that can be done at any point.
>

Nothing prevents - true, but that's plainly wrong from OS point of view
to grab physical ranges for something without knowing what's actually
behind on that platform. I think we shouldn't consider this as a valid
thing to do and don't try to workaround initially incorrect code.

>
> -boris
>
>>
>> Does it even make sense to remove the 1-st level only restriction in
>> kernel/resource.c ?
>
>
>