Re: [PATCH v9 6/6] selftests/mm: add hwpoison-panic destructive test
From: Miaohe Lin
Date: Fri Jun 26 2026 - 03:13:15 EST
On 2026/6/9 18:57, Breno Leitao wrote:
> Add a destructive selftest that verifies
> vm.panic_on_unrecoverable_memory_failure actually panics when a
> hwpoison error hits a kernel-owned page.
>
> Three "kinds" of kernel-owned page can be targeted, selectable via
> the script's first positional argument (default: rodata):
>
> rodata - a PG_reserved page in the kernel rodata range, sourced
> from the "Kernel rodata" sub-resource of "System RAM" in
> /proc/iomem. That entry is reported on every major
> architecture and guarantees the chosen PFN is backed by
> struct page (an online System RAM range, not a firmware
> hole), is PG_reserved, and is read-only -- so even if
> the panic fails to fire for some reason, the resulting
> PG_hwpoison marker on rodata does not corrupt writable
> kernel state.
>
> slab - a slab page found by walking /proc/kpageflags for the
> first PFN with KPF_SLAB set (and KPF_HWPOISON / KPF_NOPAGE
> / KPF_COMPOUND_TAIL clear). Exercises the get_any_page()
> path on a non PG_reserved kernel-owned page and so
> catches regressions where get_any_page() collapses
> kernel-owned pages into a transient -EIO instead of
> -ENOTRECOVERABLE.
>
> pgtable - same as slab, but the PFN is selected via KPF_PGTABLE.
>
> PageLargeKmalloc, the fourth page type matched by
> HWPoisonKernelOwned(), is intentionally not covered: it is a
> PAGE_TYPE_OPS flag with no /proc/kpageflags bit, so selecting such
> a PFN from userspace is not feasible. The slab and pgtable
> variants already exercise the same get_any_page() positive-check
> branch.
>
> The script enables the sysctl and writes the selected physical
> address to /sys/devices/system/memory/hard_offline_page. A
> successful run crashes the kernel with
>
> Memory failure: <pfn>: unrecoverable page
>
> A return from the inject means the panic did not fire and the test
> fails. Test outcome is therefore observed externally (serial
> console, kdump) rather than from the script's own exit code.
>
> The script is intentionally NOT wired into run_vmtests.sh: every
> successful run panics the kernel, which is incompatible with the
> sequential "run each category in the same VM" model that
> run_vmtests.sh assumes. It is also not registered as a TEST_PROGS /
> ksft_* wrapper so a default kselftest run does not opt itself into
> a panic. The script is meant to be executed manually inside a
> disposable VM (e.g. virtme-ng), one variant per VM boot, and
> requires RUN_DESTRUCTIVE=1 in the environment as a safety net.
>
> Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
Looks good to me with two comments below.
> ---
> tools/testing/selftests/mm/Makefile | 4 +
> tools/testing/selftests/mm/hwpoison-panic.sh | 208 +++++++++++++++++++++++++++
> 2 files changed, 212 insertions(+)
>
> diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
> index e6df968f0971..ed321ae709da 100644
> --- a/tools/testing/selftests/mm/Makefile
> +++ b/tools/testing/selftests/mm/Makefile
> @@ -174,6 +174,10 @@ TEST_PROGS += ksft_userfaultfd.sh
> TEST_PROGS += ksft_vma_merge.sh
> TEST_PROGS += ksft_vmalloc.sh
>
> +# Destructive: every successful run panics the kernel. Installed and
> +# kept executable, but not run from a default kselftest invocation.
> +TEST_PROGS_EXTENDED += hwpoison-panic.sh
> +
> TEST_FILES := test_vmalloc.sh
> TEST_FILES += test_hmm.sh
> TEST_FILES += va_high_addr_switch.sh
> diff --git a/tools/testing/selftests/mm/hwpoison-panic.sh b/tools/testing/selftests/mm/hwpoison-panic.sh
> new file mode 100755
> index 000000000000..fe58e7638a8b
> --- /dev/null
> +++ b/tools/testing/selftests/mm/hwpoison-panic.sh
> @@ -0,0 +1,208 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Verify vm.panic_on_unrecoverable_memory_failure by injecting a hwpoison
> +# error on a kernel-owned page and confirming the kernel panics.
> +#
> +# Three "kinds" of kernel-owned page can be targeted, selectable via the
> +# first positional argument (default: rodata):
> +#
> +# rodata - a PG_reserved page in the kernel rodata range
> +# (sourced from /proc/iomem "Kernel rodata"). Exercises
> +# memory_failure() -> get_any_page() on a PageReserved page.
> +#
> +# slab - a slab page found via /proc/kpageflags (KPF_SLAB).
> +# Exercises memory_failure() -> get_any_page() on a non
> +# PG_reserved kernel-owned page. This path is what catches
> +# regressions where get_any_page() collapses kernel-owned
> +# pages into a transient -EIO instead of -ENOTRECOVERABLE.
> +#
> +# pgtable - a page-table page found via /proc/kpageflags (KPF_PGTABLE).
> +# Same path as slab, different page type.
> +#
> +# This test is DESTRUCTIVE: a successful run crashes the kernel. It is
> +# meant to be executed inside a disposable VM (e.g. virtme-ng) with a
> +# serial console captured by the harness. It is skipped unless the
> +# caller opts in via RUN_DESTRUCTIVE=1.
> +#
> +# Test passes externally: the kernel must panic with
> +# "Memory failure: <pfn>: unrecoverable page"
> +# A return from the inject means the panic did not fire and the test
> +# fails.
> +#
> +# Author: Breno Leitao <leitao@xxxxxxxxxx>
> +
> +set -u
> +
> +ksft_skip=4
> +sysctl_path=/proc/sys/vm/panic_on_unrecoverable_memory_failure
> +inject_path=/sys/devices/system/memory/hard_offline_page
> +kpageflags_path=/proc/kpageflags
> +
> +# /proc/kpageflags bit positions (see include/uapi/linux/kernel-page-flags.h)
> +KPF_SLAB=7
> +KPF_COMPOUND_TAIL=16
> +KPF_HWPOISON=19
> +KPF_NOPAGE=20
> +KPF_PGTABLE=26
> +
> +kind=${1:-rodata}
> +
> +ksft_print() { echo "# $*"; }
> +ksft_exit_skip() { ksft_print "$*"; exit "$ksft_skip"; }
> +ksft_exit_fail() { echo "not ok 1 $*"; exit 1; }
> +
> +if [ "$(id -u)" -ne 0 ]; then
> + ksft_exit_skip "must run as root"
> +fi
> +
> +if [ ! -w "$sysctl_path" ]; then
> + ksft_exit_skip "$sysctl_path not present (kernel without the sysctl?)"
> +fi
> +
> +if [ ! -w "$inject_path" ]; then
> + ksft_exit_skip "$inject_path not present (no MEMORY_HOTPLUG?)"
> +fi
> +
> +if [ "${RUN_DESTRUCTIVE:-0}" != "1" ]; then
> + ksft_exit_skip "destructive test; re-run with RUN_DESTRUCTIVE=1 inside a disposable VM"
> +fi
> +
> +# Pick a PFN inside the kernel image rodata region of /proc/iomem.
> +# This is preferred over a top-level "Reserved" entry because top-level
> +# Reserved ranges are often firmware holes that have no backing struct
> +# page; pfn_to_online_page() returns NULL on those and memory_failure()
> +# bails out with -ENXIO before reaching the panic path.
> +#
> +# "Kernel rodata" is reported as a sub-resource of "System RAM" on every
> +# major architecture, which guarantees:
> +# - the PFN is backed by struct page (within an online memory range);
> +# - PG_reserved is set on the page (kernel image area);
> +# - the memory is read-only, so setting PG_hwpoison on it does not
> +# corrupt writable kernel state if the panic somehow does not fire.
> +#
> +# /proc/iomem entries look like (indented for sub-resources):
> +# " 02500000-02ffffff : Kernel rodata"
> +pick_rodata_phys_addr() {
> + awk -v pagesize="$(getconf PAGE_SIZE)" '
> + # Convert a hex string to a number without relying on the gawk-only
> + # strtonum(). mawk lacks it and would otherwise spuriously skip
> + # this test on distros that ship mawk as /usr/bin/awk.
> + function hex2num(s, n, i, c, v) {
> + n = 0
> + for (i = 1; i <= length(s); i++) {
> + c = tolower(substr(s, i, 1))
> + v = index("0123456789abcdef", c) - 1
> + if (v < 0)
> + return -1
> + n = n * 16 + v
> + }
> + return n
> + }
> + /: Kernel rodata[[:space:]]*$/ {
> + sub(/^[[:space:]]+/, "")
> + n = split($0, a, /[- ]/)
> + start = hex2num(a[1])
> + end = hex2num(a[2])
> + if (end <= start)
> + next
> + # Page-align upward and emit the first byte of that page.
> + pfn = int((start + pagesize - 1) / pagesize)
> + printf "0x%x\n", pfn * pagesize
> + exit 0
> + }
> + ' /proc/iomem
> +}
> +
> +# Walk /proc/kpageflags and return the phys addr of the first PFN that
> +# has bit $1 set, with KPF_HWPOISON, KPF_NOPAGE and KPF_COMPOUND_TAIL
> +# all clear (so we attack a real, non-tail, not-already-poisoned page).
> +#
> +# We skip the first 16 MiB of PFNs to step past low-memory special
> +# ranges (BIOS/EFI/ACPI/etc.) that often are PG_reserved and would not
> +# exhibit the slab/pgtable type we are looking for.
> +pick_kpageflags_phys_addr() {
> + local want_bit=$1
> + local pagesize skip_pfn
> +
> + [ -r "$kpageflags_path" ] || return
> +
> + pagesize=$(getconf PAGE_SIZE)
> + skip_pfn=$(((16 * 1024 * 1024) / pagesize))
> +
> + od -An -tx8 -v -w8 -j "$((skip_pfn * 8))" "$kpageflags_path" 2>/dev/null | \
> + awk -v want_bit="$want_bit" \
> + -v hwp_bit="$KPF_HWPOISON" \
> + -v nopage_bit="$KPF_NOPAGE" \
> + -v tail_bit="$KPF_COMPOUND_TAIL" \
> + -v base_pfn="$skip_pfn" \
> + -v pagesize="$pagesize" '
> + # Test whether bit "b" is set in the 16-hex-digit value "hex".
> + # Done with substring + per-digit lookup so we never rely on awk
> + # bitwise operators (mawk lacks them), 64-bit FP precision or the
> + # gawk-only strtonum().
> + function bit_set(hex, b, di, bi, c, v) {
> + di = int(b / 4)
> + bi = b - di * 4
> + c = substr(hex, length(hex) - di, 1)
> + v = index("0123456789abcdef", tolower(c)) - 1
> + if (bi == 0) return (v % 2) == 1
> + if (bi == 1) return int(v / 2) % 2 == 1
> + if (bi == 2) return int(v / 4) % 2 == 1
> + return int(v / 8) % 2 == 1
> + }
> + {
> + gsub(/^[[:space:]]+/, "")
> + h = $1
> + if (bit_set(h, want_bit) &&
> + !bit_set(h, hwp_bit) &&
> + !bit_set(h, nopage_bit) &&
> + !bit_set(h, tail_bit)) {
> + pfn = base_pfn + NR - 1
> + printf "0x%x\n", pfn * pagesize
> + exit 0
> + }
> + }
> + '
> +}
> +
> +case "$kind" in
> +rodata)
> + phys_addr=$(pick_rodata_phys_addr)
> + missing_msg='no "Kernel rodata" entry in /proc/iomem'
> + ;;
> +slab)
> + phys_addr=$(pick_kpageflags_phys_addr "$KPF_SLAB")
> + missing_msg="no usable slab PFN found in $kpageflags_path"
> + ;;
> +pgtable)
> + phys_addr=$(pick_kpageflags_phys_addr "$KPF_PGTABLE")
> + missing_msg="no usable page-table PFN found in $kpageflags_path"
> + ;;
> +*)
> + ksft_exit_fail "unknown kind '$kind' (expected: rodata|slab|pgtable)"
> + ;;
> +esac
> +
> +if [ -z "$phys_addr" ]; then
> + ksft_exit_skip "$missing_msg"
> +fi
> +
> +ksft_print "enabling $sysctl_path"
> +prior=$(cat "$sysctl_path")
> +echo 1 > "$sysctl_path" || ksft_exit_fail "failed to enable sysctl"
> +
> +ksft_print "injecting hwpoison at phys 0x$(printf '%x' "$phys_addr") (kind=$kind)"
> +ksft_print "expecting kernel panic: 'Memory failure: <pfn>: unrecoverable page'"
> +
> +# If this returns, the kernel did not panic → test failed. Restore the
> +# sysctl before reporting so the system is left as we found it.
> +if echo "$phys_addr" > "$inject_path"; then
> + echo "$prior" > "$sysctl_path"
> + ksft_exit_fail "inject returned without panic; sysctl ineffective"
In case of failure, should we recheck the page type? There is a window between
we get the phys_addr and inject the hwpoison.
> +fi
> +
> +# Write failed (e.g. -EINVAL on offlining a non-online region): also a
> +# failure for this test, since we expected the panic path.
> +echo "$prior" > "$sysctl_path"
> +ksft_exit_fail "inject failed before reaching the panic path"
Should we unpoison the pfn in case of failure?
Thanks.
.