Re: [PATCH] mm/vmalloc: widen guard region to defeat ENTER-based stack pivot

From: Xiang Mei

Date: Fri Jun 26 2026 - 13:47:35 EST

On Fri, Jun 26, 2026 at 10:34:44AM -0700, Xiang Mei wrote:
> With CONFIG_VMAP_STACK, kernel stacks are allocated in the vmalloc area,
> which an unprivileged user can surround with attacker-controlled data by
> spraying vmap allocations adjacent to a target stack (for example via
> XDP_UMEM_REG, though other vmalloc spray paths work too). Today each
> guarded vmalloc allocation is followed by a single unmapped guard page.
>
> A single guard page is not enough to contain the x86_64 ENTER
> instruction used as a one-instruction stack pivot. ENTER imm16, imm8
> builds a stack frame and lowers RSP by:
>
> imm16 + 8 * (L + 1), L = imm8 & 0x1f
>
> imm16 is an unsigned 16-bit operand (ENTER never raises RSP), and L is
> in [0, 31], so the maximum displacement of a single ENTER is:
>
> 0xffff + 8 * 0x20 = 0x100ff bytes
>
> That is more than enough to step off the current stack, across the
> one-page guard, and into the adjacent sprayed pages. When those pages
> contain a return sled feeding a ROP chain, reaching any ENTER gadget
> (opcode 0xc8, abundant as both intended and unintended gadgets) turns a
> control-flow hijack into full ROP execution without any register control
> at the hijack site, making it a one-gadget-style primitive that
> significantly eases exploitation. The pivot happens after the control
> transfer, so it is not constrained by CFI (kCFI/FineIBT).
>
> Widen the guard region from one page to VMAP_GUARD_PAGES (0x11 pages,
> 0x11000 bytes), which is the smallest whole-page span exceeding the
> 0x100ff-byte maximum single-ENTER pivot. A pivot off the top of the
> stack now lands in the unmapped guard and faults, instead of in mapped,
> attacker-controlled memory. RANDOMIZE_KSTACK_OFFSET only perturbs RSP by
> a sub-page amount, so it does not change the required width.
>
> Introduce a VMAP_GUARD_PAGES knob that defaults to a single page (no
> change for current architectures) and can be overridden per arch via
> asm/vmalloc.h, and set it to 0x11 on x86_64. This is deliberately scoped
> to x86_64: the 0x100ff bound is a property of the ENTER opcode, and ENTER
> is also a one-byte opcode (0xc8) that appears as abundant unintended
> gadgets. Other architectures (e.g. arm64) have no equivalent
> single-instruction, immediate-controlled pivot reachable as an unaligned
> unintended gadget, so they keep the one-page guard and pay no cost.
>
> The override is gated on CONFIG_X86_64 rather than applying to all of x86:
> VMAP_STACK is selected only on x86_64, so 32-bit kernel stacks are not in
> the vmalloc area and the technique does not apply there. 32-bit x86 also
> has a far smaller vmalloc window, where widening every guarded area by 16
> pages would needlessly pressure the address space.
>
> The guard pages are never populated, so there is no extra physical
> memory and no additional page-table population beyond the larger virtual
> span; the cost is virtual address space and vmap_area bookkeeping, which
> is negligible against the 64-bit vmalloc window. get_vm_area_size() is
> adjusted by the same VMAP_GUARD_SIZE so the usable size reported to
> callers is unchanged.
>
> On x86 this widens the guard for all guarded vmap areas, not only thread
> stacks. ret2enter targets the stack specifically, so a narrower
> alternative is to apply the wider guard only on the thread-stack
> allocation path via a dedicated VM_ flag; we kept the change in the
> common path as defense in depth for any vmalloc-adjacent pivot target,
> but are happy to scope it to stacks if maintainers prefer.
>
> Signed-off-by: Xiang Mei <xmei5@xxxxxxx>
> Signed-off-by: Jennifer Miller <jmill@xxxxxxx>
> ---
> arch/x86/include/asm/vmalloc.h | 21 +++++++++++++++++++++
> include/linux/vmalloc.h | 16 ++++++++++++++--
> mm/vmalloc.c | 2 +-
> 3 files changed, 36 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h
> index 49ce331f3ac6..2c341f398227 100644
> --- a/arch/x86/include/asm/vmalloc.h
> +++ b/arch/x86/include/asm/vmalloc.h
> @@ -5,6 +5,27 @@
> #include <asm/page.h>
> #include <asm/pgtable_areas.h>
>
> +/*
> + * The x86 ENTER instruction can be used as a one-instruction stack pivot:
> + * ENTER imm16, imm8 lowers RSP by imm16 + 8 * (L + 1), L = imm8 & 0x1f.
> + * imm16 is an unsigned 16-bit operand (ENTER never raises RSP) and L is in
> + * [0, 31], so a single ENTER can lower RSP by at most
> + * 0xffff + 8 * 0x20 = 0x100ff bytes. With CONFIG_VMAP_STACK the kernel
> + * stack lives in the vmalloc area, where an unprivileged user can spray
> + * adjacent allocations; a single-page guard is too small to contain such a
> + * pivot. Use 0x11 guard pages (0x11000 bytes), the smallest whole-page
> + * span exceeding 0x100ff, so the pivot faults in the guard instead of
> + * landing in attacker-controlled memory.
> + *
> + * Restrict this to 64-bit: VMAP_STACK is selected only on x86_64, so 32-bit
> + * kernel stacks are not in the vmalloc area and the technique does not apply.
> + * 32-bit also has a far smaller vmalloc window, where a 16-page-per-area
> + * widening would needlessly pressure the address space.
> + */
> +#ifdef CONFIG_X86_64
> +#define VMAP_GUARD_PAGES 0x11
> +#endif
> +
> #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>
> #ifdef CONFIG_X86_64
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 3b02c0c6b371..b8546e519deb 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -49,6 +49,18 @@ struct iov_iter; /* in uio.h */
> #define IOREMAP_MAX_ORDER (7 + PAGE_SHIFT) /* 128 pages */
> #endif
>
> +/*
> + * Number of unmapped guard pages appended to each guarded vmalloc
> + * allocation. The default is a single page; an architecture may override
> + * VMAP_GUARD_PAGES (via asm/vmalloc.h) when a wider guard is needed to
> + * contain a worst-case single-instruction stack pivot into an adjacent,
> + * attacker-controlled vmap allocation (see arch/x86 for the ENTER case).
> + */
> +#ifndef VMAP_GUARD_PAGES
> +#define VMAP_GUARD_PAGES 1
> +#endif
> +#define VMAP_GUARD_SIZE (VMAP_GUARD_PAGES * PAGE_SIZE)
> +
> struct vm_struct {
> union {
> struct vm_struct *next; /* Early registration of vm_areas. */
> @@ -236,8 +248,8 @@ int vmap_pages_range(unsigned long addr, unsigned long end, pgprot_t prot,
> static inline size_t get_vm_area_size(const struct vm_struct *area)
> {
> if (!(area->flags & VM_NO_GUARD))
> - /* return actual size without guard page */
> - return area->size - PAGE_SIZE;
> + /* return actual size without guard region */
> + return area->size - VMAP_GUARD_SIZE;
> else
> return area->size;
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index bb6ae08d18f5..8bb2b3ef40a8 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3217,7 +3217,7 @@ struct vm_struct *__get_vm_area_node(unsigned long size,
> return NULL;
>
> if (!(flags & VM_NO_GUARD))
> - size += PAGE_SIZE;
> + size += VMAP_GUARD_SIZE;
>
> area->flags = flags;
> area->caller = caller;
> --
> 2.43.0
>

Hi hardening team and maintainers,

This patch widens the vmalloc guard region on x86_64 to close a
class of stack-pivot primitive against CONFIG_VMAP_STACK kernels. We would
like the hardening and mm/x86 maintainers' view on whether this belongs in
the common guard path or should be scoped to thread stacks.

The description below is intentionally theoretical: it explains why a single
guard page is insufficient, without operational exploitation detail.

Background / threat model
=========================

With CONFIG_VMAP_STACK (the default on x86_64), kernel thread stacks live in
the vmalloc area, which allocates roughly linearly. In principle an
unprivileged user can place attacker-controlled vmalloc allocations near a
target stack, the vmalloc area exposes several allocation paths reachable
from userspace, so at the moment of a control-flow hijack the live stack
can be adjacent to attacker-controlled mapped pages.

Today each guarded vmalloc allocation is followed by a single unmapped guard
page. The guard relies on the assumption that a corrupted RSP cannot jump
far enough, in one step, to clear the guard and land in the next mapped
allocation. On x86_64 that assumption does not hold.

The primitive
=============

The x86_64 ENTER instruction (ENTER imm16, imm8) builds a stack frame in a
single instruction and lowers RSP by:

imm16 + 8 * (L + 1), L = imm8 & 0x1f

imm16 is an unsigned 16-bit operand and ENTER never raises RSP, so a single
ENTER lowers RSP by up to 0xffff + 8 * 0x20 = 0x100ff bytes, which is more than a
page. A single instruction can therefore move RSP off the current stack,
across the one-page guard, and into an adjacent vmalloc allocation.

Theoretically this matters because:

- The displacement is attacker-chosen (via the immediates) up to 0x100ff,
so the pivot can clear any guard narrower than that in one step.
- ENTER is reachable as a gadget, so a pivot of this size is available
without depending on register state at the hijack site.
- The pivot happens after the control transfer, so it is not constrained
by forward-edge CFI (kCFI / FineIBT).

Taken together these make it a one-gadget-style pivot that generalizes
across control-flow hijacks: because no register control is required at the
hijack site, any control-flow hijack able to reach a single ENTER gadget can
move RSP into an adjacent vmalloc allocation. The net effect is that a single
guard page is not a reliable boundary between a pivoted RSP and the next
mapped allocation on x86_64.

Because this lifts a generic primitive across control-flow hijacks rather
than easing one specific bug, we think the guard itself should be widened so
the boundary holds regardless of the originating hijack.

We have a working proof-of-concept and are happy to share it privately with
maintainers; we are keeping the offensive details off the public list.

The fix
=======

Widen the guard region from one page to the smallest whole-page span that
exceeds the worst-case single-ENTER displacement: 0x11 pages (0x11000 bytes
> 0x100ff). A pivot off the top of the stack then lands in unmapped guard
memory and faults, instead of in mapped, attacker-controlled pages.

This is scoped to x86_64 via a VMAP_GUARD_PAGES knob that defaults to one
page (no change for any other architecture) and is overridden only under
CONFIG_X86_64. The 0x100ff bound is a property of the ENTER opcode, which is
also a one-byte unintended gadget; other architectures have no equivalent
single-instruction, immediate-controlled pivot reachable as an unaligned
gadget, so they keep the one-page guard and pay no cost. VMAP_STACK is
selected only on x86_64, so 32-bit x86 is excluded as well.

Cost: the guard pages are never populated, so there is no extra physical
memory and no extra page-table population beyond the larger virtual span --
only virtual address space and vmap_area bookkeeping, negligible against the
64-bit vmalloc window. get_vm_area_size() is adjusted by the same amount so
the usable size reported to callers is unchanged.

Open question for maintainers
=============================

On x86_64 this widens the guard for all guarded vmap areas, not only thread
stacks. ret2enter targets the stack specifically, so a narrower alternative
is to apply the wider guard only on the thread-stack allocation path via a
dedicated VM_ flag. We kept it in the common path as defense in depth for any
vmalloc-adjacent pivot target, but are happy to scope it to stacks if you
prefer.

We would like to hear your feedback and suggestions.

Thanks,
Xiang