Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview

From: Avi Kivity
Date: Sun May 02 2010 - 12:49:31 EST

Next message: Javier Martinez Canillas: "[PATCH] Staging: adis16255: Fix compile error"
Previous message: Changli Gao: "[PATCH] fs: use kmalloc() to allocate fdmem if possible"
In reply to: Dan Magenheimer: "RE: Frontswap [PATCH 0/4] (was Transcendent Memory): overview"
Next in thread: Dan Magenheimer: "RE: Frontswap [PATCH 0/4] (was Transcendent Memory): overview"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 05/02/2010 07:06 PM, Dan Magenheimer wrote:

NO! Frontswap on Xen+tmem never *never* _never_ NEVER results
in host swapping. Host swapping is evil. Host swapping is
the root of most of the bad reputation that memory overcommit
has gotten from VMware customers. Host swapping can't be
avoided with some memory overcommit technologies (such as page
sharing), but frontswap on Xen+tmem CAN and DOES avoid it.

Why host-level swapping is evil? In KVM case, VM is just another
process and host will just swap out pages using the same LRU like
scheme as with any other process, AFAIK.

The first problem is that you are simulating a fast resource
(RAM) with a resource that is orders of magnitude slower with
NO visibility to the user that suffers the consequences. A good
analogy (and no analogy is perfect) is if Linux discovers a 16MHz
80286 on a serial card in addition to the 32 3GHz cores on a
Nehalem box and, whenever the 32 cores are all busy, randomly
schedules a process on the 80286, while recording all CPU usage
data as if the 80286 is a "real" processor.... "Hmmm... why
did my compile suddenly run 100 times slower?"

It's bad, but it's better than ooming.

The same thing happens with vcpus: you run 10 guests on one core, if they all wake up, your cpu is suddenly 10x slower and has 30000x interrupt latency (30ms vs 1us, assuming 3ms timeslices). Your disks become slower as well.

It's worse with memory, so you try to swap as a last resort. However, swap is still faster than a crashed guest.

The second problem is "double swapping": A guest may choose
a page to swap to "guest swap", but invisibly to the guest,
the host first must fetch it from "host swap". (This may
seem like it is easy to avoid... it is not and happens more
frequently than you might think.)

True. In fact when the guest and host use the same LRU algorithm, it becomes even likelier. That's one of the things CMM2 addresses.

Third, host swapping makes live migration much more difficult.
Either the host swap disk must be accessible to all machines
or data sitting on a local disk MUST be migrated along with
RAM (which is not impossible but complicates live migration
substantially).

kvm does live migration with swapping, and has no special code to integrate them.

Last I checked, VMware does not allow
page-sharing and live migration to both be enabled for the
same host.

Don't know about vmware, but kvm supports page sharing, swapping, and live migration simultaneously.

If you talk to VMware customers (especially web-hosting services)
that have attempted to use overcommit technologies that require
host-swapping, you will find that they quickly become allergic
to memory overcommit and turn it off. The end users (users of
the VMs that inexplicably grind to a halt) complain loudly.
As a result, RAM has become a bottleneck in many many systems,
which ultimately reduces the utility of servers and the value
of virtualization.

Choosing the correct overcommit ratio is certainly not an easy task. However, just hoping that memory will be available when you need it is not a good solution.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Javier Martinez Canillas: "[PATCH] Staging: adis16255: Fix compile error"
Previous message: Changli Gao: "[PATCH] fs: use kmalloc() to allocate fdmem if possible"
In reply to: Dan Magenheimer: "RE: Frontswap [PATCH 0/4] (was Transcendent Memory): overview"
Next in thread: Dan Magenheimer: "RE: Frontswap [PATCH 0/4] (was Transcendent Memory): overview"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]