Re: [LSF/MM TOPIC] VM containers

From: Rik van Riel
Date: Mon Jan 25 2016 - 12:26:05 EST


On 01/24/2016 12:06 PM, One Thousand Gnomes wrote:
>>> That changes some of the goals the memory management subsystem has,
>>> from "use all the resources effectively" to "use as few resources as
>>> necessary, in case the host needs the memory for something else".
>
> Also "and take guidance/provide telemetry" - because you want to tune the
> VM behaviours based upon policy and to learn from them for when you re-run
> that container.
>
>> Beyond memory consumption, I would be interested whether we can harden the kernel by the paravirt interfaces for memory protection in VMs (if any). For example, the hypervisor could write-protect part of the page tables or kernel data structures in VMs, and does it help?
>
> There are four behaviours I can think of, some of which you see in
> various hypervisors and security hardening systems
>
> - die on write (a write here causes a security trap and termination after
> the guest has marked the page range die on write, and it cannot be
> unmarked). The guest OS at boot can for example mark all it's code as
> die-on-write.
> - irrevocably read only (VM never allows page to be rewritten by guest
> after the guest marks the page range irrevocably r/o)

For these we get the question "how do we make it harder for the
guest to remap the page tables to point at read/write memory,
and modify that instead of the read-only memory?"

On "smaller" guests (less than 1TB in size), it may be enough to
ensure that the kernel PUD pointer points to the (read-only) kernel
PUD at context switch time, placing the main kernel page tables,
kernel text, and some other things in read-only memory.

> - asynchronous faulting (pages the guest thinks are in it's memory but
> are in fact on the hosts swap cause a subscribable fault in the guest
> so that it can (where possible) be context switched

KVM (and s390) already do the asynchronous page fault trick.

> - free if needed - marking pages as freed up and either you get a page
> back as it was or a fault and a zeroed page

People have worked on this for KVM. I do not remember what
happened to the code.

--
All rights reversed