Reasons to merge suspend2.

From: Nigel Cunningham
Date: Tue Apr 24 2007 - 21:33:26 EST

Next message: Yinghai Lu: "RE: [PATCH] x86_64/acpi: make kernel to be compiled whenCONFIG_ACPI_NUMA is set and power management with acpi is not enabled"
Previous message: John Anthony Kazos Jr.: "Re: Kernel traces coming back with trash/clutter"
Next in thread: Al Boldi: "Re: Reasons to merge suspend2."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi all.

I've been working on this email on and off for a while, but since Pavel
raised the issue again, I thought I should make a concerted effort to
finish it...

In this email, I'm going to outline the problems with the current design
(uswsusp and swsusp) and the ways in which Suspend2 overcomes those
limitations, before going on to outline the additional advantages
Suspend2 has for users and address objections previously raised against
merging Suspend2.

A) Problems with the current design.
====================================

1) Ordering of operations.

The current [u]swsusp design doesn't do things in discrete, well ordered
stages. Storage for the image is not allocated until after the atomic
copy has been done. This means that the process can fail when we are a
significant portion of the way into suspending, and it means it can fail
when the user will seriously expect it to run to completion. The
solution to this issue is simple: separate preparing to suspend from
actually writing the image. In the preparation step, ensure, so far as
you are able, that there will be sufficient memory and sufficient
storage to complete the process, and don't write anything or do any
atomic copying until after that has been done.

The only valid objection I can think of is that you can't know for
certain prior to doing the atomic copy how much memory & storage will be
needed for allocations by driver suspend methods. That can be addressed
by a simple extension of the driver model, where in drivers could report
how many pages they will need. (If slab will be needed, the worst case
can be assumed). Rafael's notify patches (recently posted) also help in
that area.

Once processes are frozen, all significant memory usage can be accounted
for, because the process doing the suspending will be the only one
allocating memory.

2) Limit on image size.

The current implementation limits the size of an image to an absolute
maximum of half the amount of ram. This is certainly an improvement over
the old days where it sought to free everything it could, but it's still
not good enough. Current memory freeing code doesn't free the exact
amount requested; often far more than has been requested is freed. This
does not only result in a smaller image. It also means the system is
proportionately less responsive on resume at whatever stage that those
pages are needed again. A full image is certainly not needed by
everyone. Those with huge amounts of memory, very fast storage devices
or particular memory usage patterns may, quite rightly, not want to
store the whole lot in an image. This doesn't mean, however, that those
who want or need (from their perspective) a full image of memory
shouldn't be able to have it. It just adds to the argument for making it
tunable (which swsusp has done too).

3) Lack of provision for tuning to individual needs.

Swsusp historically included very little provision whatsoever for the
user to tune their configuration. This has recently begun to change, and
I applaud that. But it needs to go further. Suspending to disk is not a
one-size-fits-all situation. People have different hardware
configurations, with the result being that some people benefit from
compression while others do better without it. Some people want
encryption in a particular configuration while others don't care about
encryption at all. Some people want to limit the image size, others
don't. Sometimes a user might want to reboot instead of powering down
(dual booting). All of this should be doable, without having to hack the
code or recompile the kernel, and should be as simple as possible.
Suspend2, via its /sys/power/suspend2 interface and hibernate-script
porcelain, makes this easy.

4) No support for multiple swap devices / non swap storage.

Until recently, [u]swsusp supported a single swap partition only.
Support for a swap file has been added, but [u]swsusp still supports
only one swap device at a time. For most people, this is adequate, but
this doesn't mean everyone should be forced to fit this mould.

[u]swsusp also lacks support for storage to non-swap. Particularly in
systems that rely on swap for normal activity, this can make [u]swsusp
less reliable. The amount of swap available varies according to
workload, so sometimes the user will be unable to suspend. To address
this raciness/competition against other swap usage, Suspend2 supports
writing to a generic file, either a partition or a file on an ordinary
partition.

B) Further advantages of Suspend2.
==================================

1) Improvements over swsusp.
----------------------------

a) Modular design.

Parts of Suspend2 implement support for storing an image in swap or in a
file, using cryptoapi for compression and/or encryption and talking to a
userspace user interface via a netlink socket. Suspend2 works just fine
without CONFIG_SWAP, CONFIG_NET and/or CONFIG_CRYPTOAPI, however,
because it uses a modular design wherein support for these subsystems is
abstracted (not to be confused with kernel modules). If you disable swap
support, for example, one file is simply not built. The number of
#ifdefs in Suspend2 is thus minimal.

In addition, the modular design made modifications such as switching
from internal compression and encryption support to cryptoapi simple and
painless. All of the required modifications were found in compression.c,
encryption.c and Kconfig in kernel/power. The old and new
implementations could even co-exist if so desired. I recently dropped
encryption support (after deciding the existing support in block dev
drivers was more than adequate). This took five minutes tops - remove
the .c and modify the Makefile and Kconfig.

The modular design also helps with implementing the user interface. Each
module gets its own subdirectory in /sys/power/suspend2, so the top
level directory is not cluttered and it's easier to find what you're
after. Switching from /proc/suspend2 to /sys/power/suspend2 required
modifications to just two main routines (one for reading and one for
writing entries).

b) Compression support.

Swsusp has no support for compressing an image. Suspend2 has optional
cryptoapi based support for compressiing the image, and includes a patch
to add an LZF based compressor to cryptoapi. When this support is used,
the speed of reading (and to a lesser extent writing) the image is
generally in the region of being doubled.

c) Optional image size limit.

Suspend2 also implements an optional, user specified soft limit on the
image size. If set to a positive value, it is interpreted as a number of
megabytes and Suspend2 attempts to free memory to keep the image size
within this limit, but won't abort the cycle if this limit isn't met. If
set to -1, Suspend2 will refuse to free any memory, and will abort if
other criteria for suspending aren't satisfied. If set to -2, it will
drop filesystem caches (equivalent to echo 1 > /proc/sys/vm/drop_caches)
prior to suspending, but will not otherwise eat memory unless necessary.

d) Cryptoapi based compression.

Suspend2 uses cryptoapi for compression. Swsusp includes no built in
support for compression.

2) Improvements over uswsusp.
-----------------------------

a) Simpler to set up.

The heart of Suspend2 is implemented in the kernel so, unlike uswsusp,
there is no need for the user to download and install userspace
libraries, build a userspace app and figure out how to create and update
an initrd or initramfs. In most situations, it just works. (The
exception is LVM and such like, where both implementations require
userspace apps to set up access to the logical volumes (or encrypted
volumes) before they can be used for resuming).

b) No unnecessary copying of data.

uswsusp copies the image to userspace and back again. It may compress
the data in userspace. But none of this is necessary. There is a
perfectly good compression and encryption library in the form of
cryptoapi already in the kernel. Suspend2 uses this. uswsusp could too.

c) API changes far less critical.

Modifications to the API between kernel and userspace can cause big
headaches for uswsusp (see, eg, the recent issue with running a 32 bit
suspend program on a 64 bit kernel, recently raised by Johannes Berg on
the linux-pm mailing list).

In Suspend2's case, userspace programs only handle the user interface.
If an API mismatch does occur, the issue will not void the user's
ability to suspend or resume.

3) Completely New Functionality/Improvements.
---------------------------------------------

a) Filewriter.

Using swap to store the image is inherently racy. To be able to suspend,
we need enough free memory and enough free storage. But getting enough
free memory might involve swapping out some memory, which reduces the
amount of available storage, which might require more free memory.

It is true that most of the time this race isn't an issue. Nevertheless,
that's the nature of races.

Suspend2 implements support for files as a means of avoiding this issue.
Thus, it is much more reliable in low memory situations than swsusp or
uswsusp.

b) Multiple swap devices.

Suspend2 supports writing an image to multiple swap devices, whereas
uswsusp and swsusp only write to one device.

c) Full image of memory.

Suspend2 implements support for writing a full image of memory. You thus
get a more responsive system post-resume; just as responsive as if you'd
never suspended. This support can be disabled via a sysfs entry
(no_pageset2).

d) Keep image mode.

Suspend2 supports keeping the image after resuming. This is used in
kiosk systems where nothing is written to the filesystem or changes are
written to a separate filesystem that is mounted after resume and
unmounted before suspending or powering off.

e) Ability to cancel a cycle.

Suspend2 allows the user to cancel a cycle (and this ability can be
disabled). This means you don't have to wait for the system to finish
suspending, then resume it to get your system back. If done prior to the
atomic copy, you have it back instantly. If afterwards, a small portion
of the image is read first.

f) Scripting support.

Suspend2 allows scripts to check whether an image exists
(cat /sys/power/suspend2/have_image), remove one (echo 0 > have_image),
and set the location of the image header (echo /dev/hda1 > resume2). One
user utilises this support to provide an initrd/ramfs based menu of
previously suspended live-cd images. This could also be used in a lab
environment with homogeneous computer specifications to allow resuming
to a login screen, then resuming the image of a user's previous session
once they have logged in.

g) Userspace user interface.

Suspend2 provides userspace based user interface programs that
communicate with the core code via a netlink socket. This allows the
user to have all the eyecandy they want (although it might slow
suspending!), without the code needing to run in kernelspace or
compromise the integrity of the image.

h) Early messages.

Suspend2 provides user-friendly handling of error conditions early in
the boot process. Sanity checks on the image are done before loading it,
and if it looks like the user has (for example) accidentally booted the
wrong kernel, Suspend2 will warn them and allow them to reboot into the
right kernel, or invalidate the image and carry on booting. This has a
25 second timeout and sensible default, so the kernel will not hang
forever.

i) Powerdown methods.

Suspend2 supports a greater variety of methods of powering down once the
image has been written. It can enter ACPI states S3, S4 or S5, use a
non-ACPI power off or resume an alternate image.

S3 was recently picked up by uswsusp, but isn't supported by swsusp. It
allows the user to suspend to ram instead of powering down after writing
the image. If the battery runs out, we resume as if they'd fully powered
off. If it doesn't, we act like the cycle was cancelled at the last
moment, reloading a small portion of the image (pages that were
overwritten by the atomic copy) before giving control back to the user.

The support for resuming an alternate image is primarily useful for a
lab/multi-distro environment. It has the same limitations regarding
mounted filesystems that normally apply, but otherwise provides a way to
switch between images quickly and easily. (One image could be a log-in
screen/image selection menu, and the other individual users or distros
sessions).

j) Transparent swsusp replacement.

Suspend2 also implements optional replacement of swsusp. When enabled,
echo disk > /sys/power/state will activate Suspend2, resume= will
override resume2= and noresume will also function as noresume2. Finally,
activating a swsusp resume will also cause Suspend2 to check whether to
resume (we don't know until we check whether the replacing of swsusp was
enabled when we suspended or not). A compile time option allows the user
to enable or disable this functionality by default.

k) Expected compression ratio.

Suspend2 allows the user to set an expected compression ratio. This
allows the user to store a larger image than might otherwise be
possible, particularly in situations where available storage is less
than the amount of memory in use. Let's imagine, for example, that the
user has 1GB of RAM and a 600MB swap partition or file. Without an
expected compression ratio, Suspend2 would always store at most 600MB in
the image. With an expected compression ratio of 50% (common for LZF),
Suspend2 will not free memory even if there's the full gigabyte of
memory in use, because it will assume that the compressed image will fit
in 500MB.

l) Simpler swap file support.

Suspend2 makes using a swap file much simpler. The user simply needs to
swapon the file, then cat /sys/power/suspend2/swap/header_locations:

# cat /sys/power/suspend2/swap/headerlocations
For swap partitions, simply use the format: resume2=swap:/dev/hda1.
For swapfile `/blot/swapfile`, use resume2=swap:/dev/hda6:0xf4000.
#
m) Multithreaded i/o.

With the recent move to doing cpu hotplugging just prior to the atomic
copy, rather than right at the start of the cycle, the possibility has
been opened up of using multiple cores to do the image de/compression.
Suspend2 now includes this. The performance improvement has been
particularly seen during compression, where the speed on a dual core P4
came up to the same as seen in reading the image (ie approximately
double that achieved without compression). This support is disabled by
default at the moment, while upstream work on interactions between cpu
hotplugging and freezing are resolved.

4) Support.
-----------

Suspend2 has very active support in mailing lists, a web site, bugzilla
and wiki. Nigel is not going to refuse to deal with people because their
kernel is tainted or isn't the latest release.

C) Objections to merging Suspend2.
==================================

1) Size of the patch.

These objections seem to have been dealt with in this morning's
discussions already. The only thing I would add is that the Suspend2
patch size is somewhat inflated by documentation. The 16000 lines quoted
includes 1100 lines of Changelog and another 1100 of documents
describing how it works and how to use it.

2) "It should be done in parts"
Since we have a modular design, some parts, such as compression and
support for writing to ordinary files can clearly be handled separately.

A comparison of the core code with that in swsusp would, however, show
that Suspend2 is far more than just a bolting on of addition features to
swsusp. Substantial changes in the basic method of operation have been
made (see esp 1A above) which would make the task far larger and more
complicated than it needs to be.

While swsusp could, therefore, be mutated into suspend2 over time, I
believe it is far more straightforward and simple to just merge
suspend2, let the two coexist for a while and then drop swsusp when
people are satisfied that suspend2 is an adequate replacement.

A tangential (but important) issue is that I simply don't have the time
to do the incremental modifications to swsusp.

3) It's not needed.

It is true that swsusp is perfectly adequate for some people. This
doesn't, however, mean that it meets the needs of all people.

To put it bluntly, if Suspend2 wasn't needed, I wouldn't be working on
it. I have more than enough in the way of other things that I'd rather
be doing, but as a user, I want more than swsusp or uswsusp deliver, so
I continue to work on Suspend2.

4) [u]swsusp will/could implement it in the future.

At the last review, Pavel replied to many of the points about Suspend2
features that swsusp lacks by saying 'uswsusp can do this'. But the
facts are that uswsusp is very slow to get these new features - the
previous revision of this paragraph had (and I believe it was accurate)
"has no new features over swsusp at the moment". Furthermore, it would
probably not be unreasonable to argue that if Suspend2 didn't have these
features, uswsusp would never have gotten them.

Hope this helps,

Nigel

Attachment: signature.asc
Description: This is a digitally signed message part

Next message: Yinghai Lu: "RE: [PATCH] x86_64/acpi: make kernel to be compiled whenCONFIG_ACPI_NUMA is set and power management with acpi is not enabled"
Previous message: John Anthony Kazos Jr.: "Re: Kernel traces coming back with trash/clutter"
Next in thread: Al Boldi: "Re: Reasons to merge suspend2."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]