Re: [PATCH v4] mm: Enable suspend-only swap spaces

From: David Hildenbrand
Date: Tue Jul 27 2021 - 08:21:36 EST


On 27.07.21 11:48, David Hildenbrand wrote:
On 27.07.21 02:12, Evan Green wrote:
Add a new SWAP_FLAG_HIBERNATE_ONLY that adds a swap region but refuses
to allow generic swapping to it. This region can still be wired up for
use in suspend-to-disk activities, but will never have regular pages
swapped to it. This flag will be passed in by utilities like swapon(8),
usage would probably look something like: swapon -o hibernate /dev/sda2.

Currently it's not possible to enable hibernation without also enabling
generic swap for a given area. One semi-workaround for this is to delay
the call to swapon() until just before attempting to hibernate, and then
call swapoff() just after hibernate completes. This is somewhat kludgy,
and also doesn't really work to keep swap out of the hibernate region.
When hibernate begins, it starts by allocating a large chunk of memory
for itself. This often ends up forcing a lot of data out into swap. By
this time the hibernate region is eligible for generic swap, so swap
ends up leaking into the hibernate region even with the workaround.

There are a few reasons why usermode might want to be able to
exclusively steer swap and hibernate. One reason relates to SSD wearing.
Hibernate's endurance and speed requirements are different from swap.
It may for instance be advantageous to keep hibernate in primary
storage, but put swap in an SLC namespace. These namespaces are faster
and have better endurance, but cost 3-4x in terms of capacity.
Exclusively steering hibernate and swap enables system designers to
accurately partition their storage without either wearing out their
primary storage, or overprovisioning their fast swap area.

Another reason to allow exclusive steering has to do with security.
The requirements for designing systems with resilience against
offline attacks are different between swap and hibernate. Swap
effectively requires a dictionary of hashes, as pages can be added and
removed arbitrarily, whereas hibernate only needs a single hash for the
entire image. If you've set up block-level integrity for swap and
image-level integrity for hibernate, then allowing swap blocks to
possibly leak out to the hibernate region is problematic, since it
creates swap pages not protected by any integrity.

Swap regions with SWAP_FLAG_HIBERNATE_ONLY set will not appear in
/proc/meminfo under SwapTotal and SwapFree, since they are not usable as
general swap. These regions do still appear in /proc/swaps.

Right, and they also don't account towards the memory overcommit
calculations.

Thanks for extending the patch description!

[...]

+ if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY) {
+ if (IS_ENABLED(CONFIG_HIBERNATION)) {
+ if (swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS)
+ return -EINVAL;
+
+ } else {
+ return -EINVAL;
+ }
+ }

We could do short

if ((swap_flags & SWAP_FLAG_HIBERNATE_ONLY) &&
(!IS_ENABLED(CONFIG_HIBERNATION) ||
(swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS)))
return -EINVAL;

or

if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY))
if (!IS_ENABLED(CONFIG_HIBERNATION) ||
(swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS))
return -EINVAL;

+
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -3335,16 +3366,20 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
if (swap_flags & SWAP_FLAG_PREFER)
prio =
(swap_flags & SWAP_FLAG_PRIO_MASK) >> SWAP_FLAG_PRIO_SHIFT;
+
+ if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY)
+ p->flags |= SWP_HIBERNATE_ONLY;
enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map);
- pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s\n",
+ pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s%s\n",
p->pages<<(PAGE_SHIFT-10), name->name, p->prio,
nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10),
(p->flags & SWP_SOLIDSTATE) ? "SS" : "",
(p->flags & SWP_DISCARDABLE) ? "D" : "",
(p->flags & SWP_AREA_DISCARD) ? "s" : "",
(p->flags & SWP_PAGE_DISCARD) ? "c" : "",
- (frontswap_map) ? "FS" : "");
+ (frontswap_map) ? "FS" : "",
+ (p->flags & SWP_HIBERNATE_ONLY) ? "H" : "");
mutex_unlock(&swapon_mutex);
atomic_inc(&proc_poll_event);


Looks like the cleanest alternative to me, as long as we don't want to
invent new interfaces.

Acked-by: David Hildenbrand <david@xxxxxxxxxx>


Pavel just mentioned uswsusp, and I wonder if it would be a possible alternative to this patch.

--
Thanks,

David / dhildenb