Re: [PATCH v2 05/11] hugetlb: Convert the vmf->pgoff to PAGE_SIZE granularity

From: XIAO WU

Date: Tue Jun 23 2026 - 06:55:52 EST


Hi Jane,

Thanks for this series — the conversion to PAGE-granularity indexing is a
nice cleanup.

I came across a Sashiko AI review of this patch series, which flagged
several issues, one of which I was able to confirm triggers a real kernel
crash:

https://sashiko.dev/#/patchset/20260617172534.1740152-1-jane.chu@xxxxxxxxxx

> +++ b/mm/hugetlb.c
> @@ -5952,8 +5955,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>          .address = address & huge_page_mask(h),
>          .real_address = address,
>          .flags = flags,
> -        .pgoff = vma_hugecache_offset(h, vma,
> -                address & huge_page_mask(h)),
> +        .pgoff = linear_page_index(vma, address),

This change sets vmf.pgoff to linear_page_index(vma, address), but
`address` here is the raw unaligned fault address, not the huge-page-aligned
address.  Previously, vma_hugecache_offset() used `address & huge_page_mask(h)`
which produced a huge-page-aligned index.

When a page fault occurs at a non-huge-page-aligned address within a hugetlb
mapping (e.g., vm_start + 0x1000 for a 2MB page), the resulting pgoff is not
a multiple of pages_per_huge_page (512 for 2MB).  This unaligned index
propagates through:

  hugetlb_fault() → hugetlb_no_page() → hugetlb_add_to_page_cache()
  → __filemap_add_folio()

where this assertion fires (mm/filemap.c:862):

  VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio);

With CONFIG_DEBUG_VM=y, this becomes a BUG() and panics the kernel.

I was able to reproduce this in a QEMU VM.  The fix should be trivial:
pass the aligned address to linear_page_index().

=== Reproduction ===

Kernel: 7.1.0-rc5-g7ba451f8a24f #1 SMP PREEMPT_DYNAMIC x86_64
Config: CONFIG_HUGETLBFS=y, CONFIG_DEBUG_VM=y, CONFIG_KASAN=y

Trigger: mmap a hugetlbfs file, then access an address at offset 0x1000
(one 4K page) into the mapping, which is unaligned relative to the 2MB
huge page boundary.

=== Full PoC ===

Compile with: gcc -o poc poc.c -static

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <fcntl.h>
#include <errno.h>

#ifndef MAP_HUGETLB
#define MAP_HUGETLB 0x40000
#endif
#ifndef MAP_HUGE_SHIFT
#define MAP_HUGE_SHIFT 26
#endif

/*
 * Bug: hugetlb_fault() sets vmf.pgoff = linear_page_index(vma, address)
 * using the raw unaligned fault address.  This unaligned pgoff reaches
 * __filemap_add_folio() which VM_BUG_ON_FOLIO's on it.
 */

static long get_hugepage_size(void)
{
    FILE *f;
    char line[256];
    long size = 2 * 1024 * 1024;

    f = fopen("/proc/meminfo", "r");
    if (!f)
        return size;
    while (fgets(line, sizeof(line), f)) {
        if (sscanf(line, "Hugepagesize: %ld kB", &size) == 1)
            size *= 1024;
    }
    fclose(f);
    return size;
}

int main(void)
{
    void *addr;
    size_t hpage_size;
    const char *hugetlbfs_path = "/mnt/huge/testfile";
    int fd;
    int ret;

    hpage_size = get_hugepage_size();
    printf("[+] Huge page size: %zu bytes\n", hpage_size);

    /* Mount hugetlbfs */
    mkdir("/mnt/huge", 0755);
    ret = syscall(__NR_mount, "hugetlbfs", "/mnt/huge", "hugetlbfs", 0, NULL);
    if (ret < 0 && errno != EBUSY && errno != ENOENT)
        perror("mount hugetlbfs");

    /* Reserve 1 huge page */
    {
        FILE *f = fopen("/proc/sys/vm/nr_hugepages", "w");
        if (f) { fprintf(f, "1"); fclose(f); }
    }

    /* Create hugetlbfs file and mmap it */
    fd = open(hugetlbfs_path, O_CREAT | O_RDWR, 0644);
    if (fd < 0) {
        perror("open hugetlbfs");
        printf("[!] Trying anonymous MAP_HUGETLB\n");
        addr = mmap(NULL, hpage_size, PROT_READ | PROT_WRITE,
                MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
        if (addr == MAP_FAILED) {
            perror("mmap MAP_HUGETLB");
            return 1;
        }
    } else {
        ftruncate(fd, hpage_size);
        addr = mmap(NULL, hpage_size, PROT_READ | PROT_WRITE,
                MAP_SHARED, fd, 0);
        close(fd);
        if (addr == MAP_FAILED) {
            perror("mmap hugetlbfs file");
            return 1;
        }
    }
    printf("[+] Mapping at %p\n", addr);

    /*
     * Trigger: access address at offset 0x1000 into the huge page.
     * vm_start is huge-page-aligned, but vm_start + 0x1000 is not.
     * hugetlb_fault() sets vmf.pgoff = linear_page_index(vma, address)
     * with the unaligned address, producing an unaligned pgoff.
     */
    printf("[+] Triggering fault at unaligned offset (%p + 0x1000)...\n", addr);
    fflush(stdout);
    volatile char *trigger = (volatile char *)addr + 0x1000;
    *trigger = 0x41;

    printf("[+] Survived: value = 0x%02x\n", *trigger);
    return 0;
}

=== Crash Log ===

Linux syzkaller 7.1.0-rc5-g7ba451f8a24f #1 SMP PREEMPT_DYNAMIC x86_64

[  527.288433][ T9873] page dumped because: VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1))
[  527.300642][ T9873] kernel BUG at mm/filemap.c:862!
[  527.301090][ T9873] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
[  527.301640][ T9873] CPU: 0 UID: 0 PID: 9873 Comm: poc Not tainted
[  527.303803][ T9873] RIP: 0010:__filemap_add_folio+0xf39/0x1200
[  527.311913][ T9873] Call Trace:
[  527.312345][ T9873]  <TASK>
[  527.312676][ T9873]  hugetlb_add_to_page_cache+0xe3/0x240
[  527.313414][ T9873]  hugetlb_no_page+0x1301/0x21b0
[  527.314402][ T9873]  hugetlb_fault+0x531/0x1570
[  527.315259][ T9873]  handle_mm_fault+0x970/0xaf0
[  527.316565][ T9873]  do_user_addr_fault+0x60b/0x14c0
[  527.317434][ T9873]  asm_exc_page_fault+0x26/0x30
[  527.318733][ T9873] RIP: 0033:0x401fa2
[  527.326921][ T9873]  <TASK>
[  527.327245][ T9873] RIP: 0010:__filemap_add_folio+0xf39/0x1200
[  527.335300][ T9873] Kernel panic - not syncing: Fatal exception

The Sashiko review also flagged a few other pre-existing issues in
this series that I haven't verified yet:

1. [Critical] remove_inode_hugepages() in patch 9: passing folio->index
   (base-page index) to hugetlb_unmap_file_folio() which multiplies by
   pages_per_huge_page(h), effectively squaring the offset and causing
   the interval tree search to miss VMAs (potential UAF).

2. [High] hugetlbfs_zero_partial_page() in patch 7: Usama already
   pointed out the start >> PAGE_SHIFT question — `start` is a byte
   offset but filemap_lock_folio() expects a page index.

3. [Critical] filemap_get_pages() in patch 4: the `if (is_hugetlbfs)
   goto done` path returns 0 with an empty batch, which could cause
   filemap_read() to loop forever when reading a hole in a hugetlbfs
   file.

Thanks,
Xiao