Re: [REGRESSION] x86/hugetlb: AMD F15h VA alignment offset breaks MAP_HUGETLB alignment
From: Oscar Salvador (SUSE)
Date: Thu May 28 2026 - 08:50:49 EST
On Thu, May 28, 2026 at 07:45:24AM +0200, Oscar Salvador (SUSE) wrote:
> On Wed, May 27, 2026 at 02:04:10PM -0700, Dave Hansen wrote:
> > On 5/27/26 11:28, Oscar Salvador (SUSE) wrote:
> > > if (filp) {
> > > info.align_mask = get_align_mask(filp);
> > > - info.align_offset += get_align_bits();
> > > + /*
> > > + * Hugepages must remain hugepage-aligned, so skip adding an offset
> > > + * in case we enabled 'align_va_addr'.
> > > + */
> > > + if (!is_file_hugepages(filp))
> > > + info.align_offset += get_align_bits();
> > > }
> >
> > That's a good hack to show the scope of the problem.
>
> Haha, do not worry, I myself have 0 interestin spreading hugetlb-specific
> code around (on the contrary), but I wanted to proof the point.
>
> >
> > But I'd really rather this be dealt with in the arch-independent code,
> > not by adding hugetlb hacks to arch code. It isn't even clear to me what
> > exactly goes wrong when you set a tiny ->align_offset and have a larger
> > ->align_mask. Shouldn't the tiny offset just get masked off?
> >
> > gap += (info->align_offset - gap) & info->align_mask;
Ok, I finally got to it.
So, let us assume we ask for a 2MB hugetlb page.
~huge_page_mask = 0x1fffff
huge_page_mask_align = PAGE_MASK & ~huge_page_mask(hstate_file(file)) = 0x1ff000
unmapped_area_topdown()
info->length = 0x200000 (2MB)
info->align_mask = 0x1ff000
/* Adjust search length to account for worst case alignment overhead */
total_gap_lenght_requested = info->length + info->align_mask = 0x3ff000
We find a gap: 0x7f28cfb10000 - 0x7f28cfd10000 (2MB) and assuming align_offset is = 0:
gap -= (gap - info->align_offset) & info->align_mask
0x7f28cfb10000 -= 0x7f28cfb10000 & 0x1ff000 = 7f28cfa00000 (2MB aligned)
IIUC, we mask what we got with align_mask to know how much we need to substract in
order to be properly aligned (and since we already accounted for extra length before,
we are sure we do not overstep anything below).
Now, I have to acknowledge that I had to look at the code several times, because
it was not clear to me why we were not just masking off 2MB, but I guess if we do we
lose whatever align_offset gives us (if smaller than align_mask).
I mean, code was already like that before my refactoring, just that back
then we did "align_offset = 0" unilaterally for hugetlb mappings.
The only thing bugging is, should not the same happen for THP-file-backed mappings?
--
Oscar Salvador
SUSE Labs