Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview

From: Avi Kivity
Date: Sun Apr 25 2010 - 08:12:14 EST


On 04/25/2010 03:30 AM, Dan Magenheimer wrote:
I see. So why not implement this as an ordinary swap device, with a
higher priority than the disk device? this way we reuse an API and
keep
things asynchronous, instead of introducing a special purpose API.

Because the swapping API doesn't adapt well to dynamic changes in
the size and availability of the underlying "swap" device, which
is very useful for swap to (bare-metal) hypervisor.
Can we extend it? Adding new APIs is easy, but harder to maintain in
the long term.
Umm... I think the difference between a "new" API and extending
an existing one here is a choice of semantics. As designed, frontswap
is an extremely simple, only-very-slightly-intrusive set of hooks that
allows swap pages to, under some conditions, go to pseudo-RAM instead
of an asynchronous disk-like device. It works today with at least
one "backend" (Xen tmem), is shipping today in real distros, and is
extremely easy to enable/disable via CONFIG or module... meaning
no impact on anyone other than those who choose to benefit from it.

"Extending" the existing swap API, which has largely been untouched for
many years, seems like a significantly more complex and error-prone
undertaking that will affect nearly all Linux users with a likely long
bug tail. And, by the way, there is no existence proof that it
will be useful.

Seems like a no-brainer to me.

My issue is with the API's synchronous nature. Both RAM and more exotic memories can be used with DMA instead of copying. A synchronous interface gives this up.

Ok. For non traditional RAM uses I really think an async API is
needed. If the API is backed by a cpu synchronous operation is fine,
but once it isn't RAM, it can be all kinds of interesting things.
Well, we shall see. It may also be the case that the existing
asynchronous swap API will work fine for some non traditional RAM;
and it may also be the case that frontswap works fine for some
non traditional RAM. I agree there is fertile ground for exploration
here. But let's not allow our speculation on what may or may
not work in the future halt forward progress of something that works
today.

Let's not allow the urge to merge prevent us from doing the right thing.


Note that even if you do give the page to the guest, you still control
how it can access it, through the page tables. So for example you can
easily compress a guest's pages without telling it about it; whenever
it
touches them you decompress them on the fly.
Yes, at a much larger more invasive cost to the kernel. Frontswap
and cleancache and tmem are all well-layered for a good reason.

No need to change the kernel at all; the hypervisor controls the page tables.

Swap has no timing
constraints, it is asynchronous and usually to slow devices.
What I was referring to is that the existing swap code DOES NOT
always have the ability to collect N scattered pages before
initiating an I/O write suitable for a device (such as an SSD)
that is optimized for writing N pages at a time. That is what
I meant by a timing constraint. See references to page_cluster
in the swap code (and this is for contiguous pages, not scattered).

I see. Given that swap-to-flash will soon be way more common than frontswap, it needs to be solved (either in flash or in the swap code).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/