Re: [PATCH] Drivers: hv: mshv_vtl: Fix vmemmap_shift exceeding MAX_FOLIO_ORDER

From: Naman Jain

Date: Fri Apr 03 2026 - 03:25:48 EST

On 4/3/2026 9:05 AM, Michael Kelley wrote:

From: Naman Jain <namjain@xxxxxxxxxxxxxxxxxxx> Sent: Tuesday, March 31, 2026 10:40 PM

When registering VTL0 memory via MSHV_ADD_VTL0_MEMORY, the kernel
computes pgmap->vmemmap_shift as the number of trailing zeros in the
OR of start_pfn and last_pfn, intending to use the largest compound
page order both endpoints are aligned to.

However, this value is not clamped to MAX_FOLIO_ORDER, so a
sufficiently aligned range (e.g. physical range 0x800000000000-
0x800080000000, corresponding to start_pfn=0x800000000 with 35
trailing zeros) can produce a shift larger than what
memremap_pages() accepts, triggering a WARN and returning -EINVAL:

WARNING: ... memremap_pages+0x512/0x650
requested folio size unsupported

The MAX_FOLIO_ORDER check was added by
commit 646b67d57589 ("mm/memremap: reject unreasonable folio/compound
page sizes in memremap_pages()").
When CONFIG_HAVE_GIGANTIC_FOLIOS=y, CONFIG_SPARSEMEM_VMEMMAP=y, and
CONFIG_HUGETLB_PAGE is not set, MAX_FOLIO_ORDER resolves to
(PUD_SHIFT - PAGE_SHIFT) = 18. Any range whose PFN alignment exceeds
order 18 hits this path.

I'm not clear on what point you are making with this specific
configuration that results in MAX_FOLIO_ORDER being 18. Is it just
an example? Is 18 the largest expected value for MAX_FOLIO_ORDER?
And note that PUD_SHIFT and PAGE_SHIFT might have different values
on arm64 with a page size other than 4K.

Yes, this was just an example. It is not generalized enough, I will remove it.
MAX_FOLIO_ORDER could go beyond 18.

Fix this by clamping vmemmap_shift to MAX_FOLIO_ORDER so we always
request the largest order the kernel supports, rather than an
out-of-range value.

Also fix the error path to propagate the actual error code from
devm_memremap_pages() instead of hard-coding -EFAULT, which was
masking the real -EINVAL return.

Fixes: 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver")
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Naman Jain <namjain@xxxxxxxxxxxxxxxxxxx>
---
drivers/hv/mshv_vtl_main.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
index 5856975f32e12..255fed3a740c1 100644
--- a/drivers/hv/mshv_vtl_main.c
+++ b/drivers/hv/mshv_vtl_main.c
@@ -405,8 +405,12 @@ static int mshv_vtl_ioctl_add_vtl0_mem(struct mshv_vtl *vtl, void __user *arg)
/*
* Determine the highest page order that can be used for the given memory range.
* This works best when the range is aligned; i.e. both the start and the length.
+ * Clamp to MAX_FOLIO_ORDER to avoid a WARN in memremap_pages() when the range
+ * alignment exceeds the maximum supported folio order for this kernel config.
*/
- pgmap->vmemmap_shift = count_trailing_zeros(vtl0_mem.start_pfn | vtl0_mem.last_pfn);
+ pgmap->vmemmap_shift = min_t(unsigned long,
+ count_trailing_zeros(vtl0_mem.start_pfn | vtl0_mem.last_pfn),
+ MAX_FOLIO_ORDER);

Is it necessary to use min_t() here, or would min() work? Neither count_trailing_zeros()
nor MAX_FOLIO_ORDER is ever negative, so it seems like just min() would work with
no potential for doing a bogus comparison or assignment.

min could work, yes. I just felt min_t is more safer for comparing these two different types of values -
count_trailing_zeroes being 'int'
MAX_FOLIO_ORDER being a macro, calculated by applying bit operations.

and destination being unsigned int.

include/linux/mmzone.h:#define MAX_FOLIO_ORDER MAX_PAGE_ORDER
include/linux/mmzone.h:#define MAX_FOLIO_ORDER PFN_SECTION_SHIFT
include/linux/mmzone.h:#define MAX_FOLIO_ORDER (ilog2(SZ_16G) - PAGE_SHIFT)
include/linux/mmzone.h:#define MAX_FOLIO_ORDER (ilog2(SZ_1G) - PAGE_SHIFT)
include/linux/mmzone.h:#define MAX_FOLIO_ORDER (PUD_SHIFT - PAGE_SHIFT)

I am fine with anything you suggest here.

The shift is calculated using the originally passed in start_pfn and last_pfn, while the
"range" struct in pgmap has an "end" value that is one page less. So is the idea to
go ahead and create the mapping with folios of a size that includes that last page,
and then just waste the last page of the last folio?

No, waste does not occur. The vmemmap_shift determines the folio order, and memmap_init_zone_device() walks the range [start_pfn, last_pfn) in steps of (1 << vmemmap_shift) pages, creating one folio per step. The OR operation ensures both boundaries are aligned to multiples of (1 << vmemmap_shift), guaranteeing the range divides evenly into folios with no partial folio at the end.
The intention is to find the highest folio order possible here, and if it reaches the MAX_FOLIO_ORDER, restrict vmemmap_shift to it.

dev_dbg(vtl->module_dev,
"Add VTL0 memory: start: 0x%llx, end_pfn: 0x%llx, page order: %lu\n",
vtl0_mem.start_pfn, vtl0_mem.last_pfn, pgmap->vmemmap_shift);
@@ -415,7 +419,7 @@ static int mshv_vtl_ioctl_add_vtl0_mem(struct mshv_vtl *vtl, void __user *arg)
if (IS_ERR(addr)) {
dev_err(vtl->module_dev, "devm_memremap_pages error: %ld\n", PTR_ERR(addr));
kfree(pgmap);
- return -EFAULT;
+ return PTR_ERR(addr);
}

/* Don't free pgmap, since it has to stick around until the memory

base-commit: 36ece9697e89016181e5ae87510e40fb31d86f2b
--
2.43.0

Thanks for reviewing.

Regards,
Naman