RE: [PATCH] hv_balloon: Add the support of hibernation

From: Dexuan Cui
Date: Fri Sep 13 2019 - 16:54:40 EST


> From: David Hildenbrand <david@xxxxxxxxxx>
> Sent: Friday, September 13, 2019 12:46 AM
>
> On 12.09.19 21:18, Dexuan Cui wrote:
> > 3. Hibernation can be especially useful when we pass through a PCIe device,
> > e.g. a NIC, a NVMe controller or a GPU, to the VM, as usually save/restore
> > and live migration can not work with this kind of configuration, because
> > usually the host doesn't know how to save/restore the state of the PCIe
> > device.
>
> Interesting. Under QEMU/KVM (especially for migration), the discussed
> solutions I am aware of rather wanted to temporarily unplug the PCI
> devices or replace them with some kind of "standby" device temporarily.

For the complex devices like a modern GPU, there may not be an
equivalent "standby" software-emulated device for it, and unplugging the
PCI device temporarily is not good, as it may not be transparent to the
userspace applications. Hibernation here is especially useful, e.g. to Virtual
Desktop Infrastructure users whose VMs can own physical GPUs, because
all the userspace applications are frozen when the VM is hibernated, and
when the VM resumes back, the applications are automatically resumed
and continue to run seamlessly, at least in theory. A hibernated VM saves
compute resources and cost for the users.

> Anyhow, would it also be an option for you instead of making the balloon
> basically useless in case the virtual ACPI S4 state is enabled to
>
> a) Remember if there was a harmful requests that was processed (memory
> add, balloon up, balloon down) - or if the device is *currently* in an
> un-hibernatable state. E.g., if somebody inflated the balloon, you can't
> hibernate. But if the balloon was deflated again, you can again hibernate.
>
> b) Block hibernation in balloon_suspend() in case the device is in such
> an un-hibernatable state.
>
>
> Then you don't need hv_is_hibernation_supported(). The VM is able to
> hibernate as long as Dynamic Memory and Memory Resizing was not used.
> This is something that can be documented perfectly well.
>
> David / dhildenb

On recent Windows Server 2019+ hosts, the toolstacks on the hosts
guarantees that Dynamic Memory and Memory Resizing can not be enabled
if the virtual ACPI S4 state is enabled, and vice versa. Please refer to the
long write-up I made here: https://lkml.org/lkml/2019/9/5/1160 .

And, to make the hibernation functionality automated, the host is able to
send a "please hibernate" message to the VM via the Hyper-V shutdown
device upon the user's request (e.g. via GUI or scripting): see
https://lkml.org/lkml/2019/9/13/811 . When the host sends the message,
it checks if the virtual ACPI S4 state is enabled for the VM: if not, the host
refuses to send the message. This means that the user does want to make
sure the virtual ACPI S4 state is enabled for the VM, if the user of the VM
wants to use the hibernation feature, and this means Dynamic Memory
and Memory Resizing can not be active due to the restrictions from the
host toolstack.

And the hibernation functionality won't be officially supported on old
Windows Server hosts.

So, IMHO we can't be bother to implement the idea you described in
detail. Sorry. :-)

And, while I agree your idea is good, technically speaking I suspect it may
not be really useful, because once hv_balloon allows balloon-up/down,
hv_balloon effectively loses control of memory pages: after the host
takes some memory away, the VM never knows when exactly the
host will give it back -- actually the host never guarantees how soon
it will give the memory back. Consequently, the VM almost immediately
ends up in an un-hibernatable state...

Thanks,
-- Dexuan