Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview

From: Avi Kivity
Date: Fri Apr 30 2010 - 14:38:59 EST


On 04/30/2010 06:59 PM, Dan Magenheimer wrote:

experiencing a load spike, you increase load even more by making the
guests swap. If you can just take some of their memory away, you can
smooth that spike out. CMM2 and frontswap do that. The guests
explicitly give up page contents that the hypervisor does not have to
first consult with the guest before discarding.
Frontswap does not do this. Once a page has been frontswapped, the
host
is committed to retaining it until the guest releases it.
Dave or others can correct me if I am wrong, but I think CMM2 also
handles dirty pages that must be retained by the hypervisor.

But those are the guest's pages in the first place, that's not a new commitment. CMM2 provides the hypervisor alternatives to swapping a page out. Frontswap provides the guest alternatives to swapping a page out.

The
difference between CMM2 (for dirty pages) and frontswap is that
CMM2 sets hints that can be handled asynchronously while frontswap
provides explicit hooks that synchronously succeed/fail.

They are not directly comparable. In fact for dirty pages CMM2 is mostly a no-op - the host is forced to swap them out if it wants them. CMM2 brings value for demand zero or clean pages which can be restored by the guest without requiring swapin.

I think for dirty pages what CMM2 brings is the ability to discard them if the host has swapped them out but the guest doesn't need them,

In fact, Avi, CMM2 is probably a fairly good approximation of what
the asynchronous interface you are suggesting might look like.
In other words,

CMM2 is more directly comparably to ballooning rather than to frontswap. Frontswap (and cleancache) work with storage that is external to the guest, and say nothing about the guest's page itself.

feasible but much much more complex than frontswap.

The swap API (e.g. the block layer) itself is an asynchronous batched version of frontswap. The complexity in CMM2 comes from the fact that it is communicating information about guest pages to the host, and from the fact that communication is two-way and asynchronous in both directions.


[frontswap is] really
not very different from a synchronous swap device.
Not to beat a dead horse, but there is a very key difference:
The size and availability of frontswap is entirely dynamic;
any page-to-be-swapped can be rejected at any time even if
a page was previously successfully swapped to the same index.
Every other swap device is much more static so the swap code
assumes a static device. Existing swap code can account for
"bad blocks" on a static device, but this is far from sufficient
to handle the dynamicity needed by frontswap.

Given that whenever frontswap fails you need to swap anyway, it is better for the host to never fail a frontswap request and instead back it with disk storage if needed. This way you avoid a pointless vmexit when you're out of memory. Since it's disk backed it needs to be asynchronous and batched.

At this point we're back with the ordinary swap API. Simply have your host expose a device which is write cached by host memory, you'll have all the benefits of frontswap with none of the disadvantages, and with no changes to guest code.


--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/