Re: ZONE_NORMAL vs. ZONE_MOVABLE

From: Andrea Arcangeli
Date: Wed Mar 15 2017 - 12:38:40 EST


On Wed, Mar 15, 2017 at 02:11:40PM +0100, Michal Hocko wrote:
> OK, I see now. I am afraid there is quite a lot of code which expects
> that zones do not overlap. We can have holes in zones but not different
> zones interleaving. Probably something which could be addressed but far
> from trivial IMHO.
>
> All that being said, I do not want to discourage you from experiments in
> those areas. Just be prepared all those are far from trivial and
> something for a long project ;)

This constraint was known for quite some time, so when I talked about
this very constraint with Mel at least year LSF/MM he suggested sticky
pageblocks would be superior to the current movable zone.

So instead of having a Movable zone, we could use the pageblocks but
make it sticky-movable so they're only going to accept __GFP_MOVABLE
allocations into them. It would be still a quite large change indeed
but it looks simpler and with fewer drawbacks than trying to make the
zone overlap.

Currently when you online memory as movable you're patching down the
movable zone not just onlining the memory and that complexity you've
to deal with, would go away with sticky movable pageblocks.

One other option could be to boot like with _DEFAULT_ONLINE=n and of
course without udev rule. Then after booting with the base memory run
one of the two echo below:

$ cat /sys/devices/system/memory/removable_hotplug_default
[disabled] online online_movable
$ echo online > /sys/devices/system/memory/removable_hotplug_default
$ echo online_movable > /sys/devices/system/memory/removable_hotplug_default

Then the "echo online/online_movable" would activate the in-kernel
hotplug mechanism that is faster and more reliable than udev and it
won't risk to run into the movable zone shift "constraint". After the
"echo" the kernel would behave like if it booted with _DEFAULT_ONLINE=y.

If you still want to do it by hand and leave it disabled or even
trying to fix udev movable shift constraints, sticky pageblocks and
lack of synchronicity (and deal with the resulting slower
performance compared to in-kernel onlining), you could.

The in-kernel onlining would use the exact same code of
_DEFAULT_ONLINE=y, but it would be configured with a file like
/etc/sysctl.conf. And then to switch it to the _movable model you
would just need to edit the file like you've to edit the udev rule
(the one that if you edit it with online_movable currently breaks).

>From usability prospective it would be like udev, but without all
drawbacks of doing the onlining in userland.

Checking if the memory should become movable or not depending on
acpi_has_method(handle, "_EJ0") isn't flexible enough I think, on bare
metal especially we couldn't change the ACPI like we can do with the
hypervisor, but the admin has still to decide freely if he wants to
risk early OOM and movable zone imbalance or if he prefers not being
able to hotunplug the memory ever again. So it would need to become a
grub boot option which is probably less friendly than editing
sysctl.conf or something like that (especially given grub-mkconfig
output..).

Thanks,
Andrea