Re: [PATCH v2] PM: hibernate: keep existing uswsusp swap pin if re-selection fails
From: Rafael J. Wysocki
Date: Tue May 26 2026 - 13:52:20 EST
On Tue, Apr 14, 2026 at 6:49 PM DaeMyung Kang <charsyam@xxxxxxxxx> wrote:
>
> Commit 5b2b0c6e4577 ("mm/swap, PM: hibernate: fix swapoff race in
> uswsusp by pinning swap device") introduced SWP_HIBERNATION so that
> the swap area selected through /dev/snapshot remains protected against
> swapoff() for the lifetime of the uswsusp session.
>
> When user space issues SNAPSHOT_SET_SWAP_AREA again,
> snapshot_set_swap_area() currently drops the old pin before attempting
> to pin the new swap area. If the new selection fails, the ioctl
> returns an error and user space is expected to abort the session.
> However, preserving the existing pin in that case makes the kernel
> side more robust against a failed re-selection, while keeping the
> existing userspace-visible behavior unchanged.
>
> Implement this with the existing swap helpers:
>
> - look up the requested swap area first
> - treat re-selecting the already pinned area as a no-op
> - pin the new area before unpinning the old one
> - leave the existing pin in place if the new pin attempt fails
>
> This keeps the hibernation session protected against swapoff() until
> /dev/snapshot is closed, even after a failed attempt to switch to a
> different swap area.
>
> Suggested-by: Youngjun Park <youngjun.park@xxxxxxx>
> Signed-off-by: DaeMyung Kang <charsyam@xxxxxxxxx>
> ---
> Notes (not part of the commit, stripped by git am):
>
> Changes in v2:
> - Drop Fixes: and Cc: stable; reframe as a hardening improvement
> rather than a regression fix, per Youngjun's feedback that the
> current behavior is intentional and there is no concrete
> user-observable harm.
> - Drop the new repin_hibernation_swap_type() helper. Rework
> snapshot_set_swap_area() in place using the existing find / pin /
> unpin helpers as Youngjun suggested; the change now touches only
> kernel/power/user.c and adds no new API.
> - Update the subject and commit log accordingly.
> - Add Suggested-by: trailer.
>
> v1: https://lore.kernel.org/lkml/20260414143200.1267932-1-charsyam@xxxxxxxxx/
>
> Baseline
> --------
> This patch is generated against linux-next at commit 5b2b0c6e4577
> ("mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap
> device"). Mainline does not yet carry that commit, and neither the
> helpers used here (find/pin/unpin_hibernation_swap_type) nor the code
> site this patch modifies exist there. The base-commit trailer at the
> bottom of the mbox records the exact commit.
>
> Testing
> -------
> The behavior change can be exercised entirely through the
> /dev/snapshot ioctl path; no actual hibernation cycle is required.
> A targeted assertion test is below; run it as root in a throwaway VM
> with two active swap block devices and one non-swap block device
> (three arguments).
>
> Run inside a VM on linux-next at 5b2b0c6e4577 with this patch applied:
>
> step1: pinned active swap /dev/vda
> step2: swapoff blocked with EBUSY while pin is held
> step3: repinned active swap to /dev/vdb
> step4: swapoff(/dev/vda) succeeded after repinning away
> step5: repinned swap is blocked with EBUSY
> step6: bogus SNAPSHOT_SET_SWAP_AREA failed as expected: No such device
> step7: swapoff(/dev/vdb) is still blocked with EBUSY
> result: pin preserved across failed re-set (hardened behavior)
> step8: swapoff succeeded after closing /dev/snapshot
>
> Without the patch, step7 instead reports
> swapoff(/dev/vdb) succeeded after failed re-set
> because the old pin had been released before the failed pin attempt.
>
> What the assertion test covers:
> - SWP_HIBERNATION is enforced against swapoff (step2, step5);
> - the success path moves the pin from one active swap to another
> (step3, step4, step5);
> - a failed re-selection preserves the existing pin (step6, step7);
> - the pin lifetime ends on /dev/snapshot close (step8).
>
> What it does not cover:
> - the snapshot_open(O_RDONLY) initial resume-device pin path;
> - the full suspend-to-disk image create/restore flow;
> - concurrent swapoff racing against SNAPSHOT_SET_SWAP_AREA;
> - the type == data->swap idempotent branch (not externally
> observable since it intentionally skips the bit toggle).
>
> A normal sysfs-based suspend-to-disk cycle continues to work; the
> find_hibernation_swap_type() / pin / unpin paths themselves are
> unchanged. Build tested with allmodconfig and run-tested with
> CONFIG_PROVE_LOCKING=y and CONFIG_KASAN=y. The VM was booted with
> oops=panic panic=-1 so any WARN/Oops/BUG would have halted the run;
> the full test completed cleanly with no kernel log diagnostics.
>
> Reproducer (C source, for reference only -- not added to the tree):
>
> // SPDX-License-Identifier: GPL-2.0
> /*
> * Reproduce / verify the SNAPSHOT_SET_SWAP_AREA pin-lifetime behavior.
> *
> * Run only inside a throwaway VM. The test manipulates swap state and
> * leaves the target swap area disabled on success.
> *
> * Usage:
> * ./uswsusp_swapoff_repro <active-swap-1> <active-swap-2> <bogus-blk>
> *
> * Exit codes:
> * 0 = expected (hardened) behavior: pin preserved across failed re-set
> * 1 = old behavior: pin dropped on failed re-set
> * 2 = setup error / inconclusive
> */
>
> #define _GNU_SOURCE
> #include <errno.h>
> #include <fcntl.h>
> #include <linux/types.h>
> #include <linux/suspend_ioctls.h>
> #include <stdbool.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/ioctl.h>
> #include <sys/stat.h>
> #include <sys/swap.h>
> #include <sys/sysmacros.h>
> #include <unistd.h>
>
> static int encode_dev(dev_t dev)
> {
> unsigned int major_num = major(dev);
> unsigned int minor_num = minor(dev);
>
> /* Match new_encode_dev() / new_decode_dev() in the kernel. */
> return (major_num & 0xfff) << 8 |
> (minor_num & 0xff) |
> ((minor_num & ~0xff) << 12);
> }
>
> static int get_block_dev(const char *path, dev_t *dev)
> {
> struct stat st;
>
> if (stat(path, &st) < 0) {
> fprintf(stderr, "stat(%s): %s\n", path, strerror(errno));
> return -errno;
> }
> if (!S_ISBLK(st.st_mode)) {
> fprintf(stderr, "%s is not a block device\n", path);
> return -EINVAL;
> }
> *dev = st.st_rdev;
> return 0;
> }
>
> static int snapshot_set_swap_area(int fd, dev_t dev, long long offset)
> {
> struct resume_swap_area area = {
> .offset = offset,
> .dev = encode_dev(dev),
> };
>
> if (ioctl(fd, SNAPSHOT_SET_SWAP_AREA, &area) < 0)
> return -errno;
> return 0;
> }
>
> int main(int argc, char **argv)
> {
> const char *p1, *p2, *pb;
> dev_t d1, d2, db;
> int fd, ret;
> bool buggy = false;
>
> if (argc != 4) {
> fprintf(stderr,
> "usage: %s <swap1> <swap2> <bogus>\n", argv[0]);
> return 2;
> }
> if (geteuid() != 0) {
> fprintf(stderr, "must run as root\n");
> return 2;
> }
> p1 = argv[1]; p2 = argv[2]; pb = argv[3];
>
> if (get_block_dev(p1, &d1) < 0 ||
> get_block_dev(p2, &d2) < 0 ||
> get_block_dev(pb, &db) < 0)
> return 2;
>
> fd = open("/dev/snapshot", O_WRONLY);
> if (fd < 0) {
> fprintf(stderr, "open(/dev/snapshot): %s\n", strerror(errno));
> return 2;
> }
>
> ret = snapshot_set_swap_area(fd, d1, 0);
> if (ret < 0) { fprintf(stderr, "step1: %s\n", strerror(-ret)); goto setup_err; }
> printf("step1: pinned active swap %s\n", p1);
>
> if (swapoff(p1) == 0) {
> fprintf(stderr, "step2: swapoff unexpectedly succeeded\n");
> close(fd); return 1;
> }
> if (errno != EBUSY) {
> fprintf(stderr, "step2: expected EBUSY, got %s\n", strerror(errno));
> goto setup_err;
> }
> printf("step2: swapoff blocked with EBUSY while pin is held\n");
>
> ret = snapshot_set_swap_area(fd, d2, 0);
> if (ret < 0) { fprintf(stderr, "step3: %s\n", strerror(-ret)); goto setup_err; }
> printf("step3: repinned active swap to %s\n", p2);
>
> if (swapoff(p1) < 0) {
> fprintf(stderr, "step4: swapoff(%s): %s\n", p1, strerror(errno));
> goto setup_err;
> }
> printf("step4: swapoff(%s) succeeded after repinning away\n", p1);
>
> if (swapoff(p2) == 0) {
> fprintf(stderr, "step5: swapoff unexpectedly succeeded\n");
> close(fd); return 1;
> }
> if (errno != EBUSY) {
> fprintf(stderr, "step5: expected EBUSY, got %s\n", strerror(errno));
> goto setup_err;
> }
> printf("step5: repinned swap is blocked with EBUSY\n");
>
> ret = snapshot_set_swap_area(fd, db, 0);
> if (!ret) {
> fprintf(stderr, "step6: bogus unexpectedly succeeded\n");
> goto setup_err;
> }
> printf("step6: bogus SNAPSHOT_SET_SWAP_AREA failed as expected: %s\n",
> strerror(-ret));
>
> if (swapoff(p2) == 0) {
> printf("step7: swapoff(%s) succeeded after failed re-set\n", p2);
> printf("result: pin was dropped on failure (old behavior)\n");
> buggy = true;
> } else if (errno == EBUSY) {
> printf("step7: swapoff(%s) is still blocked with EBUSY\n", p2);
> printf("result: pin preserved across failed re-set (hardened behavior)\n");
> } else {
> fprintf(stderr, "step7: unexpected: %s\n", strerror(errno));
> goto setup_err;
> }
>
> close(fd);
> if (!buggy) {
> if (swapoff(p2) < 0) {
> fprintf(stderr, "step8: swapoff(%s): %s\n", p2, strerror(errno));
> return 2;
> }
> printf("step8: swapoff succeeded after closing /dev/snapshot\n");
> }
> printf("note: re-enable with `swapon %s` and `swapon %s`\n", p1, p2);
> return buggy ? 1 : 0;
>
> setup_err:
> close(fd);
> return 2;
> }
>
> kernel/power/user.c | 35 ++++++++++++++++++++++++++---------
> 1 file changed, 26 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/power/user.c b/kernel/power/user.c
> index 4406f5644a56..e1ab85db2e95 100644
> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -218,6 +218,7 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
> {
> sector_t offset;
> dev_t swdev;
> + int type, swap;
>
> if (swsusp_swap_in_use())
> return -EPERM;
> @@ -239,18 +240,34 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
> }
>
> /*
> - * Unpin the swap device if a swap area was already
> - * set by SNAPSHOT_SET_SWAP_AREA.
> + * User space encodes device types as two-byte values, so we need to
> + * recode them.
> */
> - unpin_hibernation_swap_type(data->swap);
> + type = find_hibernation_swap_type(swdev, offset);
> + if (type < 0)
> + return swdev ? -ENODEV : -EINVAL;
>
> - /*
> - * User space encodes device types as two-byte values,
> - * so we need to recode them
> - */
> - data->swap = pin_hibernation_swap_type(swdev, offset);
> - if (data->swap < 0)
> + if (type == data->swap) {
> + /*
> + * Re-selecting the already pinned swap area is a no-op.
> + * Keep the existing pin and just refresh the cached device id.
> + */
> + data->dev = swdev;
> + return 0;
> + }
> +
> + swap = pin_hibernation_swap_type(swdev, offset);
> + if (swap < 0) {
> + /*
> + * Preserve the existing pin on failure. This can happen if the
> + * target swap area disappears before pinning, or via the
> + * defensive -EBUSY path in pin_hibernation_swap_type().
> + */
> return swdev ? -ENODEV : -EINVAL;
> + }
> +
> + unpin_hibernation_swap_type(data->swap);
> + data->swap = swap;
> data->dev = swdev;
> return 0;
> }
>
This doesn't apply for me on top of the current mainline. Maybe a
requisite change is still missing.
Please feel free to resend it when the mainline is ready for it.