Re: [PATCH RFC v3 1/2] mm: Add personality flag to limit address to 47 bits

From: Charlie Jenkins
Date: Tue Sep 10 2024 - 19:29:38 EST


On Tue, Sep 10, 2024 at 09:13:33AM +0000, Arnd Bergmann wrote:
> On Mon, Sep 9, 2024, at 23:22, Charlie Jenkins wrote:
> > On Fri, Sep 06, 2024 at 10:52:34AM +0100, Lorenzo Stoakes wrote:
> >> On Fri, Sep 06, 2024 at 09:14:08AM GMT, Arnd Bergmann wrote:
> >> The intent is to optionally be able to run a process that keeps higher bits
> >> free for tagging and to be sure no memory mapping in the process will
> >> clobber these (correct me if I'm wrong Charlie! :)
> >>
> >> So you really wouldn't want this if you are using tagged pointers, you'd
> >> want to be sure literally nothing touches the higher bits.
>
> My understanding was that the purpose of the existing design
> is to allow applications to ask for a high address without having
> to resort to the complexity of MAP_FIXED.
>
> In particular, I'm sure there is precedent for applications that
> want both tagged pointers (for most mappings) and untagged pointers
> (for large mappings). With a per-mm_struct or per-task_struct
> setting you can't do that.
>
> > Various architectures handle the hint address differently, but it
> > appears that the only case across any architecture where an address
> > above 47 bits will be returned is if the application had a hint address
> > with a value greater than 47 bits and was using the MAP_FIXED flag.
> > MAP_FIXED bypasses all other checks so I was assuming that it would be
> > logical for MAP_FIXED to bypass this as well. If MAP_FIXED is not set,
> > then the intent is for no hint address to cause a value greater than 47
> > bits to be returned.
>
> I don't think the MAP_FIXED case is that interesting here because
> it has to work in both fixed and non-fixed mappings.
>
> >> This would be more consistent vs. other arches.
> >
> > Yes riscv is an outlier here. The reason I am pushing for something like
> > a flag to restrict the address space rather than setting it to be the
> > default is it seems like if applications are relying on upper bits to be
> > free, then they should be explicitly asking the kernel to keep them free
> > rather than assuming them to be free.
>
> Let's see what the other architectures do and then come up with
> a way that fixes the pointer tagging case first on those that are
> broken. We can see if there needs to be an extra flag after that.
> Here is what I found:
>
> - x86_64 uses DEFAULT_MAP_WINDOW of BIT(47), uses a 57 bit
> address space when an addr hint is passed.
> - arm64 uses DEFAULT_MAP_WINDOW of BIT(47) or BIT(48), returns
> higher 52-bit addresses when either a hint is passed or
> CONFIG_EXPERT and CONFIG_ARM64_FORCE_52BIT is set (this
> is a debugging option)
> - ppc64 uses a DEFAULT_MAP_WINDOW of BIT(47) or BIT(48),
> returns 52 bit address when an addr hint is passed
> - riscv uses a DEFAULT_MAP_WINDOW of BIT(47) but only uses
> it for allocating the stack below, ignoring it for normal
> mappings
> - s390 has no DEFAULT_MAP_WINDOW but tried to allocate in
> the current number of pgtable levels and only upgrades to
> the next level (31, 42, 53, 64 bits) if a hint is passed or
> the current level is exhausted.
> - loongarch64 has no DEFAULT_MAP_WINDOW, and a default VA
> space of 47 bits (16K pages, 3 levels), but can support
> a 55 bit space (64K pages, 3 levels).
> - sparc has no DEFAULT_MAP_WINDOW and up to 52 bit VA space.
> It may allocate both positive and negative addresses in
> there. (?)
> - mips64, parisc64 and alpha have no DEFAULT_MAP_WINDOW and
> at most 48, 41 or 39 address bits, respectively.
>
> I would suggest these changes:
>
> - make riscv enforce DEFAULT_MAP_WINDOW like x86_64, arm64
> and ppc64, leave it at 47
>
> - add DEFAULT_MAP_WINDOW on loongarch64 (47/48 bits
> based on page size), sparc (48 bits) and s390 (unsure if
> 42, 53, 47 or 48 bits)
>
> - leave the rest unchanged.
>
> Arnd

Changing all architectures to have a standardized DEFAULT_MAP_WINDOW
mostly solves the problem. However, I am concerned that it is fragile
for applications to rely on a default like this. Having the personality
bit flag is supposed to provide an intuitive ABI for users that
guarantees that they will not accidentally request for memory outside of
the boundary that they specified.

Also you bring up that the DEFAULT_MAP_WINDOW would not be able to be
standardized across architectures, so we still have the problem that
this default behavior will be different across architectures which I am
trying to solve.

- Charlie