compressed swap performance issues

Paul R. Wilson (wilson@cs.utexas.edu)
Mon, 21 Dec 1998 03:52:06 -0600


Compressed VM has been tried before, and gave mixed results, but
technology trends make it a much better deal now, and getting
better all the time.

Fred Douglis built it for Sprite about 6 years ago, and got
speedups for some programs and slowdowns for others. He had
two problems: a slow CPU and a poor adaptation mechanism.
He was using a DECStation 5000/200, which was about an order
of magnitude slower than current machines. Every three years,
the relative speed of CPU and disk latencies improves by 2x.
(Disk latencies improve 20% per year, while CPUs improve 60%
per year. Main memory bandwidth improves 40% a year, and
it won't be a problem for comrpessed VM for at least a few
years---at which point the compression algorithms will be
so fast that they'll be memory bandwidth limited. By that
time, compressed VM will be a big, big win even if it hits
a wall and doesn't get any better.)

Russinovich and Cogswell also built a compressed VM system,
for Windows 95. Their system was not adaptive at all---it
had a fixed split between RAM and disk. For the best split,
they found that it was barely breaking even for the WinBench
application workload. This was on a 486 DX2/66---a very
slow machine by today's standards.

BTW, the Apple Newton has been using compressed VM for years
and years, using a compression algorithm I designed and gave
to Walter Smith. It works very well. For a diskless system
the issues are different, because you can't page at all.
(Walter was more worried about watts than speed---he wanted
compression to avoid eating up CPU and running down the
battery. He also wanted to avoid cycling the flash RAM disk
and wearing it out.) Windows CE uses compressed caching, too.

Given recent CPU speeds, compressed caching should be a major
peformance win for disk-based workstations unless prefetching
works so well that the average time to fetch a page is less than
a millisecond. Even with such effective prefetching, it's
likely to be a win in many situations, as long as you
have decent adaptation to keep the costs low when you don't
need it.

Consider the usual way people actually use PC's, with GUI-based
applications. You're doing coarse-grained task switching under
user control, on a timescale that's usually at least second and
more often minutes. The paging system is not being used for
paging in the normal OS-textbook sense, but for process swapping---the
idle tasks' pages get swapped out and the re-activated task's pages
get swapped in. In this kind of situation, your programs don't
page except when you switch from program to program. (This is
clearly common for Linux users. Many users only use a little
bit of swap, which is used to handle the transients when things
temporarily won't fit in RAM because they're switching from one
working set to another. The you hear that tearing sound from
the disk, doing a zillion fetches and write backs in a hurry,
and the disk bogs down.)

If your idle tasks' pages get compressed, and are only uncompressed
when you re-activate those tasks seconds or minutes later, then
compressing and uncompressing those pages can't cost much at all.
At 20MB/sec to uncompress them and compress something else, you can
restore a 10MB working set in 1/4 second. As long as the task switches
are at least a few seconds apart, then you can't lose more than a few
percent of your CPU. Even if compression doesn't let you keep the
idle tasks' working sets in memory, it can double the bandwidth
of your disk so that you can pull the stuff back into memory
2x faster. (With a little work, the compression and decompression
could be pipelined with the actual disk reads and writes.)

For normal paging, as opposed to coarse-grained task switching,
whether compression is a win depends on the locality characteristics
of the program. It usually *is* win (tens of percent of paging
time), and adaptation can make sure it doesn't generally cost much
when it's not a win.

The tradeoffs look even better for diskless systems. Consider paging
over an ethernet. There you're seriously bandwidth-limited, even
on a fast ethernet. Even a small compressed paging cache can let
you compress the pages so that you double the bandwidth. (We can
compress and uncompress faster than a fast ethernet can transmit,
and faster than a plain ethernet.) Equally important, if you're
paging off a server, you can increase the effectiveness of the
server-side cache. (Thanks to Pavel for pointing me to nbd.)
Using a reasonable fraction of the memory on the client side as
a compressed page cache has another benefit, as well: it reduces
the frequency with which you page to the server, as well as
reducing the size of the pages that must be transmitted to
the server and cached or paged out there. More machines can
page off the same server before you saturate the network, or
overfill the server's cache, or churn its disk with pageouts.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/