Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview

From: Avi Kivity
Date: Fri Apr 30 2010 - 14:11:21 EST


On 04/30/2010 07:43 PM, Dan Magenheimer wrote:
Given that whenever frontswap fails you need to swap anyway, it is
better for the host to never fail a frontswap request and instead back
it with disk storage if needed. This way you avoid a pointless vmexit
when you're out of memory. Since it's disk backed it needs to be
asynchronous and batched.

At this point we're back with the ordinary swap API. Simply have your
host expose a device which is write cached by host memory, you'll have
all the benefits of frontswap with none of the disadvantages, and with
no changes to guest .
I think you are making a number of possibly false assumptions here:
1) The host [the frontswap backend may not even be a hypervisor]

True. My remarks only apply to frontswap-to-hypervisor, for internally consumed frontswap the situation is different.

2) can back it with disk storage [not if it is a bare-metal hypervisor]

So it seems a bare-metal hypervisor has less access to the bare metal than a non-bare-metal hypervisor?

Seriously, leave the bare-metal FUD to Simon. People on this list know that kvm and Xen have exactly the same access to the hardware (well actually Xen needs to use privileged guests to access some of its hardware).

3) avoid a pointless vmexit [no vmexit for a non-VMX (e.g. PV) guest]

There's still an exit. It's much faster than a vmx/svm vmexit but still nontrivial.

But why are we optimizing for 5 year old hardware?

4) when you're out of memory [how can this be determined outside of
the hypervisor?]

It's determined by the hypervisor, same as with tmem. The guest swaps to a virtual disk, the hypervisor places the data in RAM if it's available, or on disk if it isn't. Write-back caching in all its glory.

And, importantly, "have your host expose a device which is write
cached by host memory"... you are implying that all guest swapping
should be done to a device managed/controlled by the host? That
eliminates guest swapping to directIO/SRIOV devices doesn't it?

You can have multiple swap devices.

wrt SR/IOV, you'll see synchronous frontswap reduce throughput. SR/IOV will swap with <1 exit/page and DMA guest pages, while frontswap/tmem will carry a 1 exit/page hit (even if no swap actually happens) and the copy cost (if it does).

The API really, really wants to be asynchronous.

Anyway, I think we can see now why frontswap might not be a good
match for a hosted hypervisor (KVM), but that doesn't make it
any less useful for a bare-metal hypervisor (or TBD for in-kernel
compressed swap and TBD for possible future pseudo-RAM technologies).

In-kernel compressed swap does seem to be a good match for a synchronous API. For future memory devices, or even bare-metal buzzword-compliant hypervisors, I disagree. An asynchronous API is required for efficiency, and they'll all have swap capability sooner or later (kvm, vmware, and I believe xen 4 already do).

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/