Re: [PATCH v2 05/11] hugetlb: Convert the vmf->pgoff to PAGE_SIZE granularity
From: XIAO WU
Date: Tue Jun 23 2026 - 06:55:52 EST
Hi Jane,
Thanks for this series — the conversion to PAGE-granularity indexing is a
nice cleanup.
I came across a Sashiko AI review of this patch series, which flagged
several issues, one of which I was able to confirm triggers a real kernel
crash:
https://sashiko.dev/#/patchset/20260617172534.1740152-1-jane.chu@xxxxxxxxxx
> +++ b/mm/hugetlb.c
> @@ -5952,8 +5955,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> .address = address & huge_page_mask(h),
> .real_address = address,
> .flags = flags,
> - .pgoff = vma_hugecache_offset(h, vma,
> - address & huge_page_mask(h)),
> + .pgoff = linear_page_index(vma, address),
This change sets vmf.pgoff to linear_page_index(vma, address), but
`address` here is the raw unaligned fault address, not the huge-page-aligned
address. Previously, vma_hugecache_offset() used `address & huge_page_mask(h)`
which produced a huge-page-aligned index.
When a page fault occurs at a non-huge-page-aligned address within a hugetlb
mapping (e.g., vm_start + 0x1000 for a 2MB page), the resulting pgoff is not
a multiple of pages_per_huge_page (512 for 2MB). This unaligned index
propagates through:
hugetlb_fault() → hugetlb_no_page() → hugetlb_add_to_page_cache()
→ __filemap_add_folio()
where this assertion fires (mm/filemap.c:862):
VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio);
With CONFIG_DEBUG_VM=y, this becomes a BUG() and panics the kernel.
I was able to reproduce this in a QEMU VM. The fix should be trivial:
pass the aligned address to linear_page_index().
=== Reproduction ===
Kernel: 7.1.0-rc5-g7ba451f8a24f #1 SMP PREEMPT_DYNAMIC x86_64
Config: CONFIG_HUGETLBFS=y, CONFIG_DEBUG_VM=y, CONFIG_KASAN=y
Trigger: mmap a hugetlbfs file, then access an address at offset 0x1000
(one 4K page) into the mapping, which is unaligned relative to the 2MB
huge page boundary.
=== Full PoC ===
Compile with: gcc -o poc poc.c -static
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <fcntl.h>
#include <errno.h>
#ifndef MAP_HUGETLB
#define MAP_HUGETLB 0x40000
#endif
#ifndef MAP_HUGE_SHIFT
#define MAP_HUGE_SHIFT 26
#endif
/*
* Bug: hugetlb_fault() sets vmf.pgoff = linear_page_index(vma, address)
* using the raw unaligned fault address. This unaligned pgoff reaches
* __filemap_add_folio() which VM_BUG_ON_FOLIO's on it.
*/
static long get_hugepage_size(void)
{
FILE *f;
char line[256];
long size = 2 * 1024 * 1024;
f = fopen("/proc/meminfo", "r");
if (!f)
return size;
while (fgets(line, sizeof(line), f)) {
if (sscanf(line, "Hugepagesize: %ld kB", &size) == 1)
size *= 1024;
}
fclose(f);
return size;
}
int main(void)
{
void *addr;
size_t hpage_size;
const char *hugetlbfs_path = "/mnt/huge/testfile";
int fd;
int ret;
hpage_size = get_hugepage_size();
printf("[+] Huge page size: %zu bytes\n", hpage_size);
/* Mount hugetlbfs */
mkdir("/mnt/huge", 0755);
ret = syscall(__NR_mount, "hugetlbfs", "/mnt/huge", "hugetlbfs", 0, NULL);
if (ret < 0 && errno != EBUSY && errno != ENOENT)
perror("mount hugetlbfs");
/* Reserve 1 huge page */
{
FILE *f = fopen("/proc/sys/vm/nr_hugepages", "w");
if (f) { fprintf(f, "1"); fclose(f); }
}
/* Create hugetlbfs file and mmap it */
fd = open(hugetlbfs_path, O_CREAT | O_RDWR, 0644);
if (fd < 0) {
perror("open hugetlbfs");
printf("[!] Trying anonymous MAP_HUGETLB\n");
addr = mmap(NULL, hpage_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap MAP_HUGETLB");
return 1;
}
} else {
ftruncate(fd, hpage_size);
addr = mmap(NULL, hpage_size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
close(fd);
if (addr == MAP_FAILED) {
perror("mmap hugetlbfs file");
return 1;
}
}
printf("[+] Mapping at %p\n", addr);
/*
* Trigger: access address at offset 0x1000 into the huge page.
* vm_start is huge-page-aligned, but vm_start + 0x1000 is not.
* hugetlb_fault() sets vmf.pgoff = linear_page_index(vma, address)
* with the unaligned address, producing an unaligned pgoff.
*/
printf("[+] Triggering fault at unaligned offset (%p + 0x1000)...\n", addr);
fflush(stdout);
volatile char *trigger = (volatile char *)addr + 0x1000;
*trigger = 0x41;
printf("[+] Survived: value = 0x%02x\n", *trigger);
return 0;
}
=== Crash Log ===
Linux syzkaller 7.1.0-rc5-g7ba451f8a24f #1 SMP PREEMPT_DYNAMIC x86_64
[ 527.288433][ T9873] page dumped because: VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1))
[ 527.300642][ T9873] kernel BUG at mm/filemap.c:862!
[ 527.301090][ T9873] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
[ 527.301640][ T9873] CPU: 0 UID: 0 PID: 9873 Comm: poc Not tainted
[ 527.303803][ T9873] RIP: 0010:__filemap_add_folio+0xf39/0x1200
[ 527.311913][ T9873] Call Trace:
[ 527.312345][ T9873] <TASK>
[ 527.312676][ T9873] hugetlb_add_to_page_cache+0xe3/0x240
[ 527.313414][ T9873] hugetlb_no_page+0x1301/0x21b0
[ 527.314402][ T9873] hugetlb_fault+0x531/0x1570
[ 527.315259][ T9873] handle_mm_fault+0x970/0xaf0
[ 527.316565][ T9873] do_user_addr_fault+0x60b/0x14c0
[ 527.317434][ T9873] asm_exc_page_fault+0x26/0x30
[ 527.318733][ T9873] RIP: 0033:0x401fa2
[ 527.326921][ T9873] <TASK>
[ 527.327245][ T9873] RIP: 0010:__filemap_add_folio+0xf39/0x1200
[ 527.335300][ T9873] Kernel panic - not syncing: Fatal exception
The Sashiko review also flagged a few other pre-existing issues in
this series that I haven't verified yet:
1. [Critical] remove_inode_hugepages() in patch 9: passing folio->index
(base-page index) to hugetlb_unmap_file_folio() which multiplies by
pages_per_huge_page(h), effectively squaring the offset and causing
the interval tree search to miss VMAs (potential UAF).
2. [High] hugetlbfs_zero_partial_page() in patch 7: Usama already
pointed out the start >> PAGE_SHIFT question — `start` is a byte
offset but filemap_lock_folio() expects a page index.
3. [Critical] filemap_get_pages() in patch 4: the `if (is_hugetlbfs)
goto done` path returns 0 with an empty batch, which could cause
filemap_read() to loop forever when reading a hole in a hugetlbfs
file.
Thanks,
Xiao