RE: Frontswap [PATCH 0/4] (was Transcendent Memory): overview

From: Dan Magenheimer
Date: Sat Apr 24 2010 - 20:32:07 EST


> >> I see. So why not implement this as an ordinary swap device, with a
> >> higher priority than the disk device? this way we reuse an API and
> >> keep
> >> things asynchronous, instead of introducing a special purpose API.
> >>
> > Because the swapping API doesn't adapt well to dynamic changes in
> > the size and availability of the underlying "swap" device, which
> > is very useful for swap to (bare-metal) hypervisor.
>
> Can we extend it? Adding new APIs is easy, but harder to maintain in
> the long term.

Umm... I think the difference between a "new" API and extending
an existing one here is a choice of semantics. As designed, frontswap
is an extremely simple, only-very-slightly-intrusive set of hooks that
allows swap pages to, under some conditions, go to pseudo-RAM instead
of an asynchronous disk-like device. It works today with at least
one "backend" (Xen tmem), is shipping today in real distros, and is
extremely easy to enable/disable via CONFIG or module... meaning
no impact on anyone other than those who choose to benefit from it.

"Extending" the existing swap API, which has largely been untouched for
many years, seems like a significantly more complex and error-prone
undertaking that will affect nearly all Linux users with a likely long
bug tail. And, by the way, there is no existence proof that it
will be useful.

Seems like a no-brainer to me.

> Ok. For non traditional RAM uses I really think an async API is
> needed. If the API is backed by a cpu synchronous operation is fine,
> but once it isn't RAM, it can be all kinds of interesting things.

Well, we shall see. It may also be the case that the existing
asynchronous swap API will work fine for some non traditional RAM;
and it may also be the case that frontswap works fine for some
non traditional RAM. I agree there is fertile ground for exploration
here. But let's not allow our speculation on what may or may
not work in the future halt forward progress of something that works
today.

> Note that even if you do give the page to the guest, you still control
> how it can access it, through the page tables. So for example you can
> easily compress a guest's pages without telling it about it; whenever
> it
> touches them you decompress them on the fly.

Yes, at a much larger more invasive cost to the kernel. Frontswap
and cleancache and tmem are all well-layered for a good reason.

> >> I think it will be true in an overwhelming number of cases. Flash
> is
> >> new enough that most devices support scatter/gather.
> >>
> > I wasn't referring to hardware capability but to the availability
> > and timing constraints of the pages that need to be swapped.
> >
>
> I have a feeling we're talking past each other here.

Could be.

> Swap has no timing
> constraints, it is asynchronous and usually to slow devices.

What I was referring to is that the existing swap code DOES NOT
always have the ability to collect N scattered pages before
initiating an I/O write suitable for a device (such as an SSD)
that is optimized for writing N pages at a time. That is what
I meant by a timing constraint. See references to page_cluster
in the swap code (and this is for contiguous pages, not scattered).

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/