Re: [PATCH RFC v3 1/2] mm: Add personality flag to limit address to 47 bits

From: Lorenzo Stoakes
Date: Fri Sep 06 2024 - 05:56:14 EST


(Sorry having issues with my IPv6 setup that duplicated the original email...

On Fri, Sep 06, 2024 at 09:14:08AM GMT, Arnd Bergmann wrote:
> On Fri, Sep 6, 2024, at 08:14, Lorenzo Stoakes wrote:
> > On Fri, Sep 06, 2024 at 07:17:44AM GMT, Arnd Bergmann wrote:
> >> On Thu, Sep 5, 2024, at 21:15, Charlie Jenkins wrote:
> >> > Create a personality flag ADDR_LIMIT_47BIT to support applications
> >> > that wish to transition from running in environments that support at
> >> > most 47-bit VAs to environments that support larger VAs. This
> >> > personality can be set to cause all allocations to be below the 47-bit
> >> > boundary. Using MAP_FIXED with mmap() will bypass this restriction.
> >> >
> >> > Signed-off-by: Charlie Jenkins <charlie@xxxxxxxxxxxx>
> >>
> >> I think having an architecture-independent mechanism to limit the size
> >> of the 64-bit address space is useful in general, and we've discussed
> >> the same thing for arm64 in the past, though we have not actually
> >> reached an agreement on the ABI previously.
> >
> > The thread on the original proposals attests to this being rather a fraught
> > topic, and I think the weight of opinion was more so in favour of opt-in
> > rather than opt-out.
>
> You mean opt-in to using the larger addresses like we do on arm64 and
> powerpc, while "opt-out" means a limit as Charlie suggested?

I guess I'm not using brilliant terminology here haha!

To clarify - the weight of opinion was for a situation where the address
space is limited, except if you set a hint above that (you could call that
opt-out or opt-in depending which way you look at it, so yeah ok very
unclear sorry!).

It was against the MAP_ flag and also I think a _flexible_ per-process
limit is also questionable as you might end up setting a limit which breaks
something else, and this starts getting messy quick.

To be clear, the ADDR_LIMIT_47BIT suggestion is absolutely a compromise and
practical suggestion.

>
> >> > @@ -22,6 +22,7 @@ enum {
> >> > WHOLE_SECONDS = 0x2000000,
> >> > STICKY_TIMEOUTS = 0x4000000,
> >> > ADDR_LIMIT_3GB = 0x8000000,
> >> > + ADDR_LIMIT_47BIT = 0x10000000,
> >> > };
> >>
> >> I'm a bit worried about having this done specifically in the
> >> personality flag bits, as they are rather limited. We obviously
> >> don't want to add many more such flags when there could be
> >> a way to just set the default limit.
> >
> > Since I'm the one who suggested it, I feel I should offer some kind of
> > vague defence here :)
> >
> > We shouldn't let perfect be the enemy of the good. This is a relatively
> > straightforward means of achieving the aim (assuming your concern about
> > arch_get_mmap_end() below isn't a blocker) which has the least impact on
> > existing code.
> >
> > Of course we can end up in absurdities where we start doing
> > ADDR_LIMIT_xxBIT... but again - it's simple, shouldn't represent an
> > egregious maintenance burden and is entirely opt-in so has things going for
> > it.
>
> I'm more confused now, I think most importantly we should try to
> handle this consistently across all architectures. The proposed
> implementation seems to completely block addresses above BIT(47)
> even for applications that opt in by calling mmap(BIT(47), ...),
> which seems to break the existing applications.

Hm, I thought the commit message suggested the hint overrides it still?

The intent is to optionally be able to run a process that keeps higher bits
free for tagging and to be sure no memory mapping in the process will
clobber these (correct me if I'm wrong Charlie! :)

So you really wouldn't want this if you are using tagged pointers, you'd
want to be sure literally nothing touches the higher bits.

>
> If we want this flag for RISC-V and also keep the behavior of
> defaulting to >BIT(47) addresses for mmap(0, ...) how about
> changing arch_get_mmap_end() to return the limit based on
> ADDR_LIMIT_47BIT and then make this default to enabled on
> arm64 and powerpc but disabled on riscv?

But you wouldn't necessarily want all processes to be so restricted, I
think this is what Charlie's trying to avoid :)

On the ohter hand - I'm not sure there are many processes on any arch
that'd want the higher mappings.

So that'd push us again towards risc v just limiting to 48-bits and only
mapping above this if a hint is provided like x86-64 does (and as you
mentioned via irc - it seems risc v is an outlier in that
DEFAULT_MAP_WINDOW == TASK_SIZE).

This would be more consistent vs. other arches.

>
> >> It's also unclear to me how we want this flag to interact with
> >> the existing logic in arch_get_mmap_end(), which attempts to
> >> limit the default mapping to a 47-bit address space already.
> >
> > How does ADDR_LIMIT_3GB presently interact with that?
>
> That is x86 specific and only relevant to compat tasks, limiting
> them to 3 instead of 4 GB. There is also ADDR_LIMIT_32BIT, which
> on arm32 is always set in practice to allow 32-bit addressing
> as opposed to ARMv2 style 26-bit addressing (IIRC ARMv3 supported
> both 26-bit and 32-bit addressing, while ARMv4 through ARMv7 are
> 32-bit only.

OK, I understand what it's for, I missed it was arch-specific bit, urgh.

I'd say this limit should be min of the arch-specific limit vs. the 48-bit
limit. If you have a 36-bit address space obviously it'd be rather unwise
to try to provide 48 bit addresses..

>
> Arnd