Re: [PATCH] mm/vmalloc: widen guard region to defeat ENTER-based stack pivot

From: Xiang Mei

Date: Fri Jun 26 2026 - 13:52:40 EST

On Fri, Jun 26, 2026 at 10:46 AM Xiang Mei <xmei5@xxxxxxx> wrote:
>
> On Fri, Jun 26, 2026 at 10:34:44AM -0700, Xiang Mei wrote:
> > With CONFIG_VMAP_STACK, kernel stacks are allocated in the vmalloc area,
> > which an unprivileged user can surround with attacker-controlled data by
> > spraying vmap allocations adjacent to a target stack (for example via
> > XDP_UMEM_REG, though other vmalloc spray paths work too). Today each
> > guarded vmalloc allocation is followed by a single unmapped guard page.
> >
> > A single guard page is not enough to contain the x86_64 ENTER
> > instruction used as a one-instruction stack pivot. ENTER imm16, imm8
> > builds a stack frame and lowers RSP by:
> >
> > imm16 + 8 * (L + 1), L = imm8 & 0x1f
> >
> > imm16 is an unsigned 16-bit operand (ENTER never raises RSP), and L is
> > in [0, 31], so the maximum displacement of a single ENTER is:
> >
> > 0xffff + 8 * 0x20 = 0x100ff bytes
> >
> > That is more than enough to step off the current stack, across the
> > one-page guard, and into the adjacent sprayed pages. When those pages
> > contain a return sled feeding a ROP chain, reaching any ENTER gadget
> > (opcode 0xc8, abundant as both intended and unintended gadgets) turns a
> > control-flow hijack into full ROP execution without any register control
> > at the hijack site, making it a one-gadget-style primitive that
> > significantly eases exploitation. The pivot happens after the control
> > transfer, so it is not constrained by CFI (kCFI/FineIBT).
> >
> > Widen the guard region from one page to VMAP_GUARD_PAGES (0x11 pages,
> > 0x11000 bytes), which is the smallest whole-page span exceeding the
> > 0x100ff-byte maximum single-ENTER pivot. A pivot off the top of the
> > stack now lands in the unmapped guard and faults, instead of in mapped,
> > attacker-controlled memory. RANDOMIZE_KSTACK_OFFSET only perturbs RSP by
> > a sub-page amount, so it does not change the required width.
> >
> > Introduce a VMAP_GUARD_PAGES knob that defaults to a single page (no
> > change for current architectures) and can be overridden per arch via
> > asm/vmalloc.h, and set it to 0x11 on x86_64. This is deliberately scoped
> > to x86_64: the 0x100ff bound is a property of the ENTER opcode, and ENTER
> > is also a one-byte opcode (0xc8) that appears as abundant unintended
> > gadgets. Other architectures (e.g. arm64) have no equivalent
> > single-instruction, immediate-controlled pivot reachable as an unaligned
> > unintended gadget, so they keep the one-page guard and pay no cost.
> >
> > The override is gated on CONFIG_X86_64 rather than applying to all of x86:
> > VMAP_STACK is selected only on x86_64, so 32-bit kernel stacks are not in
> > the vmalloc area and the technique does not apply there. 32-bit x86 also
> > has a far smaller vmalloc window, where widening every guarded area by 16
> > pages would needlessly pressure the address space.
> >
> > The guard pages are never populated, so there is no extra physical
> > memory and no additional page-table population beyond the larger virtual
> > span; the cost is virtual address space and vmap_area bookkeeping, which
> > is negligible against the 64-bit vmalloc window. get_vm_area_size() is
> > adjusted by the same VMAP_GUARD_SIZE so the usable size reported to
> > callers is unchanged.
> >
> > On x86 this widens the guard for all guarded vmap areas, not only thread
> > stacks. ret2enter targets the stack specifically, so a narrower
> > alternative is to apply the wider guard only on the thread-stack
> > allocation path via a dedicated VM_ flag; we kept the change in the
> > common path as defense in depth for any vmalloc-adjacent pivot target,
> > but are happy to scope it to stacks if maintainers prefer.
> >
> > Signed-off-by: Xiang Mei <xmei5@xxxxxxx>
> > Signed-off-by: Jennifer Miller <jmill@xxxxxxx>
> > ---
> > arch/x86/include/asm/vmalloc.h | 21 +++++++++++++++++++++
> > include/linux/vmalloc.h | 16 ++++++++++++++--
> > mm/vmalloc.c | 2 +-
> > 3 files changed, 36 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h
> > index 49ce331f3ac6..2c341f398227 100644
> > --- a/arch/x86/include/asm/vmalloc.h
> > +++ b/arch/x86/include/asm/vmalloc.h
> > @@ -5,6 +5,27 @@
> > #include <asm/page.h>
> > #include <asm/pgtable_areas.h>
> >
> > +/*
> > + * The x86 ENTER instruction can be used as a one-instruction stack pivot:
> > + * ENTER imm16, imm8 lowers RSP by imm16 + 8 * (L + 1), L = imm8 & 0x1f.
> > + * imm16 is an unsigned 16-bit operand (ENTER never raises RSP) and L is in
> > + * [0, 31], so a single ENTER can lower RSP by at most
> > + * 0xffff + 8 * 0x20 = 0x100ff bytes. With CONFIG_VMAP_STACK the kernel
> > + * stack lives in the vmalloc area, where an unprivileged user can spray
> > + * adjacent allocations; a single-page guard is too small to contain such a
> > + * pivot. Use 0x11 guard pages (0x11000 bytes), the smallest whole-page
> > + * span exceeding 0x100ff, so the pivot faults in the guard instead of
> > + * landing in attacker-controlled memory.
> > + *
> > + * Restrict this to 64-bit: VMAP_STACK is selected only on x86_64, so 32-bit
> > + * kernel stacks are not in the vmalloc area and the technique does not apply.
> > + * 32-bit also has a far smaller vmalloc window, where a 16-page-per-area
> > + * widening would needlessly pressure the address space.
> > + */
> > +#ifdef CONFIG_X86_64
> > +#define VMAP_GUARD_PAGES 0x11
> > +#endif
> > +
> > #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> >
> > #ifdef CONFIG_X86_64
> > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > index 3b02c0c6b371..b8546e519deb 100644
> > --- a/include/linux/vmalloc.h
> > +++ b/include/linux/vmalloc.h
> > @@ -49,6 +49,18 @@ struct iov_iter; /* in uio.h */
> > #define IOREMAP_MAX_ORDER (7 + PAGE_SHIFT) /* 128 pages */
> > #endif
> >
> > +/*
> > + * Number of unmapped guard pages appended to each guarded vmalloc
> > + * allocation. The default is a single page; an architecture may override
> > + * VMAP_GUARD_PAGES (via asm/vmalloc.h) when a wider guard is needed to
> > + * contain a worst-case single-instruction stack pivot into an adjacent,
> > + * attacker-controlled vmap allocation (see arch/x86 for the ENTER case).
> > + */
> > +#ifndef VMAP_GUARD_PAGES
> > +#define VMAP_GUARD_PAGES 1
> > +#endif
> > +#define VMAP_GUARD_SIZE (VMAP_GUARD_PAGES * PAGE_SIZE)
> > +
> > struct vm_struct {
> > union {
> > struct vm_struct *next; /* Early registration of vm_areas. */
> > @@ -236,8 +248,8 @@ int vmap_pages_range(unsigned long addr, unsigned long end, pgprot_t prot,
> > static inline size_t get_vm_area_size(const struct vm_struct *area)
> > {
> > if (!(area->flags & VM_NO_GUARD))
> > - /* return actual size without guard page */
> > - return area->size - PAGE_SIZE;
> > + /* return actual size without guard region */
> > + return area->size - VMAP_GUARD_SIZE;
> > else
> > return area->size;
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index bb6ae08d18f5..8bb2b3ef40a8 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3217,7 +3217,7 @@ struct vm_struct *__get_vm_area_node(unsigned long size,
> > return NULL;
> >
> > if (!(flags & VM_NO_GUARD))
> > - size += PAGE_SIZE;
> > + size += VMAP_GUARD_SIZE;
> >
> > area->flags = flags;
> > area->caller = caller;
> > --
> > 2.43.0
> >
>
>
> Hi hardening team and maintainers,
>
> This patch widens the vmalloc guard region on x86_64 to close a
> class of stack-pivot primitive against CONFIG_VMAP_STACK kernels. We would
> like the hardening and mm/x86 maintainers' view on whether this belongs in
> the common guard path or should be scoped to thread stacks.
>
> The description below is intentionally theoretical: it explains why a single
> guard page is insufficient, without operational exploitation detail.
>
> Background / threat model
> =========================
>
> With CONFIG_VMAP_STACK (the default on x86_64), kernel thread stacks live in
> the vmalloc area, which allocates roughly linearly. In principle an
> unprivileged user can place attacker-controlled vmalloc allocations near a
> target stack, the vmalloc area exposes several allocation paths reachable
> from userspace, so at the moment of a control-flow hijack the live stack
> can be adjacent to attacker-controlled mapped pages.
>
> Today each guarded vmalloc allocation is followed by a single unmapped guard
> page. The guard relies on the assumption that a corrupted RSP cannot jump
> far enough, in one step, to clear the guard and land in the next mapped
> allocation. On x86_64 that assumption does not hold.
>
> The primitive
> =============
>
> The x86_64 ENTER instruction (ENTER imm16, imm8) builds a stack frame in a
> single instruction and lowers RSP by:
>
> imm16 + 8 * (L + 1), L = imm8 & 0x1f
>
> imm16 is an unsigned 16-bit operand and ENTER never raises RSP, so a single
> ENTER lowers RSP by up to 0xffff + 8 * 0x20 = 0x100ff bytes, which is more than a
> page. A single instruction can therefore move RSP off the current stack,
> across the one-page guard, and into an adjacent vmalloc allocation.
>
> Theoretically this matters because:
>
> - The displacement is attacker-chosen (via the immediates) up to 0x100ff,
> so the pivot can clear any guard narrower than that in one step.
> - ENTER is reachable as a gadget, so a pivot of this size is available
> without depending on register state at the hijack site.
> - The pivot happens after the control transfer, so it is not constrained
> by forward-edge CFI (kCFI / FineIBT).
Please ignore this line; it is not related since we assume we already
have a CFH primitive. Sorry for the confusion.
>
> Taken together these make it a one-gadget-style pivot that generalizes
> across control-flow hijacks: because no register control is required at the
> hijack site, any control-flow hijack able to reach a single ENTER gadget can
> move RSP into an adjacent vmalloc allocation. The net effect is that a single
> guard page is not a reliable boundary between a pivoted RSP and the next
> mapped allocation on x86_64.
>
> Because this lifts a generic primitive across control-flow hijacks rather
> than easing one specific bug, we think the guard itself should be widened so
> the boundary holds regardless of the originating hijack.
>
> We have a working proof-of-concept and are happy to share it privately with
> maintainers; we are keeping the offensive details off the public list.
>
> The fix
> =======
>
> Widen the guard region from one page to the smallest whole-page span that
> exceeds the worst-case single-ENTER displacement: 0x11 pages (0x11000 bytes
> > 0x100ff). A pivot off the top of the stack then lands in unmapped guard
> memory and faults, instead of in mapped, attacker-controlled pages.
>
> This is scoped to x86_64 via a VMAP_GUARD_PAGES knob that defaults to one
> page (no change for any other architecture) and is overridden only under
> CONFIG_X86_64. The 0x100ff bound is a property of the ENTER opcode, which is
> also a one-byte unintended gadget; other architectures have no equivalent
> single-instruction, immediate-controlled pivot reachable as an unaligned
> gadget, so they keep the one-page guard and pay no cost. VMAP_STACK is
> selected only on x86_64, so 32-bit x86 is excluded as well.
>
> Cost: the guard pages are never populated, so there is no extra physical
> memory and no extra page-table population beyond the larger virtual span --
> only virtual address space and vmap_area bookkeeping, negligible against the
> 64-bit vmalloc window. get_vm_area_size() is adjusted by the same amount so
> the usable size reported to callers is unchanged.
>
> Open question for maintainers
> =============================
>
> On x86_64 this widens the guard for all guarded vmap areas, not only thread
> stacks. ret2enter targets the stack specifically, so a narrower alternative
> is to apply the wider guard only on the thread-stack allocation path via a
> dedicated VM_ flag. We kept it in the common path as defense in depth for any
> vmalloc-adjacent pivot target, but are happy to scope it to stacks if you
> prefer.
>
> We would like to hear your feedback and suggestions.
>
> Thanks,
> Xiang
>