Umm... I think the difference between a "new" API and extendingCan we extend it? Adding new APIs is easy, but harder to maintain inI see. So why not implement this as an ordinary swap device, with aBecause the swapping API doesn't adapt well to dynamic changes in
higher priority than the disk device? this way we reuse an API and
keep
things asynchronous, instead of introducing a special purpose API.
the size and availability of the underlying "swap" device, which
is very useful for swap to (bare-metal) hypervisor.
the long term.
an existing one here is a choice of semantics. As designed, frontswap
is an extremely simple, only-very-slightly-intrusive set of hooks that
allows swap pages to, under some conditions, go to pseudo-RAM instead
of an asynchronous disk-like device. It works today with at least
one "backend" (Xen tmem), is shipping today in real distros, and is
extremely easy to enable/disable via CONFIG or module... meaning
no impact on anyone other than those who choose to benefit from it.
"Extending" the existing swap API, which has largely been untouched for
many years, seems like a significantly more complex and error-prone
undertaking that will affect nearly all Linux users with a likely long
bug tail. And, by the way, there is no existence proof that it
will be useful.
Seems like a no-brainer to me.
Ok. For non traditional RAM uses I really think an async API isWell, we shall see. It may also be the case that the existing
needed. If the API is backed by a cpu synchronous operation is fine,
but once it isn't RAM, it can be all kinds of interesting things.
asynchronous swap API will work fine for some non traditional RAM;
and it may also be the case that frontswap works fine for some
non traditional RAM. I agree there is fertile ground for exploration
here. But let's not allow our speculation on what may or may
not work in the future halt forward progress of something that works
today.
Note that even if you do give the page to the guest, you still controlYes, at a much larger more invasive cost to the kernel. Frontswap
how it can access it, through the page tables. So for example you can
easily compress a guest's pages without telling it about it; whenever
it
touches them you decompress them on the fly.
and cleancache and tmem are all well-layered for a good reason.
Swap has no timingWhat I was referring to is that the existing swap code DOES NOT
constraints, it is asynchronous and usually to slow devices.
always have the ability to collect N scattered pages before
initiating an I/O write suitable for a device (such as an SSD)
that is optimized for writing N pages at a time. That is what
I meant by a timing constraint. See references to page_cluster
in the swap code (and this is for contiguous pages, not scattered).