Re: [PATCH RFC v3 1/2] mm: Add personality flag to limit address to 47 bits
From: Charlie Jenkins
Date: Mon Sep 09 2024 - 19:22:19 EST
On Fri, Sep 06, 2024 at 10:52:34AM +0100, Lorenzo Stoakes wrote:
> (Sorry having issues with my IPv6 setup that duplicated the original email...
>
> On Fri, Sep 06, 2024 at 09:14:08AM GMT, Arnd Bergmann wrote:
> > On Fri, Sep 6, 2024, at 08:14, Lorenzo Stoakes wrote:
> > > On Fri, Sep 06, 2024 at 07:17:44AM GMT, Arnd Bergmann wrote:
> > >> On Thu, Sep 5, 2024, at 21:15, Charlie Jenkins wrote:
> > >> > Create a personality flag ADDR_LIMIT_47BIT to support applications
> > >> > that wish to transition from running in environments that support at
> > >> > most 47-bit VAs to environments that support larger VAs. This
> > >> > personality can be set to cause all allocations to be below the 47-bit
> > >> > boundary. Using MAP_FIXED with mmap() will bypass this restriction.
> > >> >
> > >> > Signed-off-by: Charlie Jenkins <charlie@xxxxxxxxxxxx>
> > >>
> > >> I think having an architecture-independent mechanism to limit the size
> > >> of the 64-bit address space is useful in general, and we've discussed
> > >> the same thing for arm64 in the past, though we have not actually
> > >> reached an agreement on the ABI previously.
> > >
> > > The thread on the original proposals attests to this being rather a fraught
> > > topic, and I think the weight of opinion was more so in favour of opt-in
> > > rather than opt-out.
> >
> > You mean opt-in to using the larger addresses like we do on arm64 and
> > powerpc, while "opt-out" means a limit as Charlie suggested?
>
> I guess I'm not using brilliant terminology here haha!
>
> To clarify - the weight of opinion was for a situation where the address
> space is limited, except if you set a hint above that (you could call that
> opt-out or opt-in depending which way you look at it, so yeah ok very
> unclear sorry!).
>
> It was against the MAP_ flag and also I think a _flexible_ per-process
> limit is also questionable as you might end up setting a limit which breaks
> something else, and this starts getting messy quick.
>
> To be clear, the ADDR_LIMIT_47BIT suggestion is absolutely a compromise and
> practical suggestion.
>
> >
> > >> > @@ -22,6 +22,7 @@ enum {
> > >> > WHOLE_SECONDS = 0x2000000,
> > >> > STICKY_TIMEOUTS = 0x4000000,
> > >> > ADDR_LIMIT_3GB = 0x8000000,
> > >> > + ADDR_LIMIT_47BIT = 0x10000000,
> > >> > };
> > >>
> > >> I'm a bit worried about having this done specifically in the
> > >> personality flag bits, as they are rather limited. We obviously
> > >> don't want to add many more such flags when there could be
> > >> a way to just set the default limit.
> > >
> > > Since I'm the one who suggested it, I feel I should offer some kind of
> > > vague defence here :)
> > >
> > > We shouldn't let perfect be the enemy of the good. This is a relatively
> > > straightforward means of achieving the aim (assuming your concern about
> > > arch_get_mmap_end() below isn't a blocker) which has the least impact on
> > > existing code.
> > >
> > > Of course we can end up in absurdities where we start doing
> > > ADDR_LIMIT_xxBIT... but again - it's simple, shouldn't represent an
> > > egregious maintenance burden and is entirely opt-in so has things going for
> > > it.
> >
> > I'm more confused now, I think most importantly we should try to
> > handle this consistently across all architectures. The proposed
> > implementation seems to completely block addresses above BIT(47)
> > even for applications that opt in by calling mmap(BIT(47), ...),
> > which seems to break the existing applications.
>
> Hm, I thought the commit message suggested the hint overrides it still?
>
> The intent is to optionally be able to run a process that keeps higher bits
> free for tagging and to be sure no memory mapping in the process will
> clobber these (correct me if I'm wrong Charlie! :)
>
> So you really wouldn't want this if you are using tagged pointers, you'd
> want to be sure literally nothing touches the higher bits.
Various architectures handle the hint address differently, but it
appears that the only case across any architecture where an address
above 47 bits will be returned is if the application had a hint address
with a value greater than 47 bits and was using the MAP_FIXED flag.
MAP_FIXED bypasses all other checks so I was assuming that it would be
logical for MAP_FIXED to bypass this as well. If MAP_FIXED is not set,
then the intent is for no hint address to cause a value greater than 47
bits to be returned.
This does have the issue that if MAP_FIXED is used then an address can
be returned above 47-bits, but if an application does not want addresses
above 47-bits then they shouldn't ask for a fixed address above that
range.
>
> >
> > If we want this flag for RISC-V and also keep the behavior of
> > defaulting to >BIT(47) addresses for mmap(0, ...) how about
> > changing arch_get_mmap_end() to return the limit based on
> > ADDR_LIMIT_47BIT and then make this default to enabled on
> > arm64 and powerpc but disabled on riscv?
>
> But you wouldn't necessarily want all processes to be so restricted, I
> think this is what Charlie's trying to avoid :)
>
> On the ohter hand - I'm not sure there are many processes on any arch
> that'd want the higher mappings.
>
> So that'd push us again towards risc v just limiting to 48-bits and only
> mapping above this if a hint is provided like x86-64 does (and as you
> mentioned via irc - it seems risc v is an outlier in that
> DEFAULT_MAP_WINDOW == TASK_SIZE).
>
> This would be more consistent vs. other arches.
Yes riscv is an outlier here. The reason I am pushing for something like
a flag to restrict the address space rather than setting it to be the
default is it seems like if applications are relying on upper bits to be
free, then they should be explicitly asking the kernel to keep them free
rather than assuming them to be free.
>
> >
> > >> It's also unclear to me how we want this flag to interact with
> > >> the existing logic in arch_get_mmap_end(), which attempts to
> > >> limit the default mapping to a 47-bit address space already.
> > >
> > > How does ADDR_LIMIT_3GB presently interact with that?
> >
> > That is x86 specific and only relevant to compat tasks, limiting
> > them to 3 instead of 4 GB. There is also ADDR_LIMIT_32BIT, which
> > on arm32 is always set in practice to allow 32-bit addressing
> > as opposed to ARMv2 style 26-bit addressing (IIRC ARMv3 supported
> > both 26-bit and 32-bit addressing, while ARMv4 through ARMv7 are
> > 32-bit only.
>
> OK, I understand what it's for, I missed it was arch-specific bit, urgh.
>
> I'd say this limit should be min of the arch-specific limit vs. the 48-bit
> limit. If you have a 36-bit address space obviously it'd be rather unwise
> to try to provide 48 bit addresses..
In this patch I set the high limit to be the minimum of the provided
high limit and 47 bits so I think that should cover this case?
- Charlie
>
> >
> > Arnd