Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview

From: Avi Kivity
Date: Mon Apr 26 2010 - 02:02:26 EST

Next message: Avi Kivity: "Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview"
Previous message: Justin P. Mattock: "[PATCH]trivial:Fix some comments."
In reply to: Dan Magenheimer: "RE: Frontswap [PATCH 0/4] (was Transcendent Memory): overview"
Next in thread: Dan Magenheimer: "RE: Frontswap [PATCH 0/4] (was Transcendent Memory): overview"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 04/25/2010 06:29 PM, Dan Magenheimer wrote:

While I admit that I started this whole discussion by implying
that frontswap (and cleancache) might be useful for SSDs, I think
we are going far astray here. Frontswap is synchronous for a
reason: It uses real RAM, but RAM that is not directly addressable
by a (guest) kernel. SSD's (at least today) are still I/O devices;
even though they may be very fast, they still live on a PCI (or
slower) bus and use DMA. Frontswap is not intended for use with
I/O devices.

Today's memory technologies are either RAM that can be addressed
by the kernel, or I/O devices that sit on an I/O bus. The
exotic memories that I am referring to may be a hybrid:
memory that is fast enough to live on a QPI/hypertransport,
but slow enough that you wouldn't want to randomly mix and
hand out to userland apps some pages from "exotic RAM" and some
pages from "normal RAM". Such memory makes no sense today
because OS's wouldn't know what to do with it. But it MAY
make sense with frontswap (and cleancache).

Nevertheless, frontswap works great today with a bare-metal
hypervisor. I think it stands on its own merits, regardless
of one's vision of future SSD/memory technologies.

Even when frontswapping to RAM on a bare metal hypervisor it makes
sense
to use an async API, in case you have a DMA engine on board.

When pages are 2MB, this may be true. When pages are 4KB and
copied individually, it may take longer to program a DMA engine
than to just copy 4KB.

Of course, you have to use a batching API, like virtio or Xen's rings, to avoid the overhead.

But in any case, frontswap works fine on all existing machines
today. If/when most commodity CPUs have an asynchronous RAM DMA
engine, an asynchronous API may be appropriate. Or the existing
swap API might be appropriate. Or the synchronous frontswap API
may work fine too. Speculating further about non-existent
hardware that might exist in the (possibly far) future is irrelevant
to the proposed patch, which works today on all existing x86 hardware
and on shipping software.

dma engines are present on commodity hardware now:

http://en.wikipedia.org/wiki/I/O_Acceleration_Technology

I don't know if consumer machines have them, but servers certainly do. modprobe ioatdma.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Avi Kivity: "Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview"
Previous message: Justin P. Mattock: "[PATCH]trivial:Fix some comments."
In reply to: Dan Magenheimer: "RE: Frontswap [PATCH 0/4] (was Transcendent Memory): overview"
Next in thread: Dan Magenheimer: "RE: Frontswap [PATCH 0/4] (was Transcendent Memory): overview"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]