Re: [PATCH RFC v3 1/2] mm: Add personality flag to limit address to 47 bits

From: Palmer Dabbelt
Date: Wed Oct 02 2024 - 10:32:10 EST


On Fri, 13 Sep 2024 14:04:06 PDT (-0700), Charlie Jenkins wrote:
> On Fri, Sep 13, 2024 at 08:41:34AM +0100, Lorenzo Stoakes wrote:
>> On Wed, Sep 11, 2024 at 11:18:12PM GMT, Charlie Jenkins wrote:
>> > On Wed, Sep 11, 2024 at 07:21:27PM +0100, Catalin Marinas wrote:
>> > > On Tue, Sep 10, 2024 at 05:45:07PM -0700, Charlie Jenkins wrote:
>> > > > On Tue, Sep 10, 2024 at 03:08:14PM -0400, Liam R. Howlett wrote:
>> > > > > * Catalin Marinas <catalin.marinas@xxxxxxx> [240906 07:44]:
>> > > > > > On Fri, Sep 06, 2024 at 09:55:42AM +0000, Arnd Bergmann wrote:
>> > > > > > > On Fri, Sep 6, 2024, at 09:14, Guo Ren wrote:
>> > > > > > > > On Fri, Sep 6, 2024 at 3:18â?¯PM Arnd Bergmann <arnd@xxxxxxxx> wrote:
>> > > > > > > >> It's also unclear to me how we want this flag to interact with
>> > > > > > > >> the existing logic in arch_get_mmap_end(), which attempts to
>> > > > > > > >> limit the default mapping to a 47-bit address space already.
>> > > > > > > >
>> > > > > > > > To optimize RISC-V progress, I recommend:
>> > > > > > > >
>> > > > > > > > Step 1: Approve the patch.
>> > > > > > > > Step 2: Update Go and OpenJDK's RISC-V backend to utilize it.
>> > > > > > > > Step 3: Wait approximately several iterations for Go & OpenJDK
>> > > > > > > > Step 4: Remove the 47-bit constraint in arch_get_mmap_end()
>> > >
>> > > Point 4 is an ABI change. What guarantees that there isn't still
>> > > software out there that relies on the old behaviour?
>> >
>> > Yeah I don't think it would be desirable to remove the 47 bit
>> > constraint in architectures that already have it.
>> >
>> > >
>> > > > > > > I really want to first see a plausible explanation about why
>> > > > > > > RISC-V can't just implement this using a 47-bit DEFAULT_MAP_WINDOW
>> > > > > > > like all the other major architectures (x86, arm64, powerpc64),
>> > > > > >
>> > > > > > FWIW arm64 actually limits DEFAULT_MAP_WINDOW to 48-bit in the default
>> > > > > > configuration. We end up with a 47-bit with 16K pages but for a
>> > > > > > different reason that has to do with LPA2 support (I doubt we need this
>> > > > > > for the user mapping but we need to untangle some of the macros there;
>> > > > > > that's for a separate discussion).
>> > > > > >
>> > > > > > That said, we haven't encountered any user space problems with a 48-bit
>> > > > > > DEFAULT_MAP_WINDOW. So I also think RISC-V should follow a similar
>> > > > > > approach (47 or 48 bit default limit). Better to have some ABI
>> > > > > > consistency between architectures. One can still ask for addresses above
>> > > > > > this default limit via mmap().
>> > > > >
>> > > > > I think that is best as well.
>> > > > >
>> > > > > Can we please just do what x86 and arm64 does?
>> > > >
>> > > > I responded to Arnd in the other thread, but I am still not convinced
>> > > > that the solution that x86 and arm64 have selected is the best solution.
>> > > > The solution of defaulting to 47 bits does allow applications the
>> > > > ability to get addresses that are below 47 bits. However, due to
>> > > > differences across architectures it doesn't seem possible to have all
>> > > > architectures default to the same value. Additionally, this flag will be
>> > > > able to help users avoid potential bugs where a hint address is passed
>> > > > that causes upper bits of a VA to be used.
>> > >
>> > > The reason we added this limit on arm64 is that we noticed programs
>> > > using the top 8 bits of a 64-bit pointer for additional information.
>> > > IIRC, it wasn't even openJDK but some JavaScript JIT. We could have
>> > > taught those programs of a new flag but since we couldn't tell how many
>> > > are out there, it was the safest to default to a smaller limit and opt
>> > > in to the higher one. Such opt-in is via mmap() but if you prefer a
>> > > prctl() flag, that's fine by me as well (though I think this should be
>> > > opt-in to higher addresses rather than opt-out of the higher addresses).
>> >
>> > The mmap() flag was used in previous versions but was decided against
>> > because this feature is more useful if it is process-wide. A
>> > personality() flag was chosen instead of a prctl() flag because there
>> > existed other flags in personality() that were similar. I am tempted to
>> > use prctl() however because then we could have an additional arg to
>> > select the exact number of bits that should be reserved (rather than
>> > being fixed at 47 bits).
>>
>> I am very much not in favour of a prctl(), it would require us to add state
>> limiting the address space and the timing of it becomes critical. Then we
>> have the same issue we do with the other proposals as to - what happens if
>> this is too low?
>>
>> What is 'too low' varies by architecture, and for 32-bit architectures
>> could get quite... problematic.
>>
>> And again, wha is the RoI here - we introducing maintenance burden and edge
>> cases vs. the x86 solution in order to... accommodate things that need more
>> than 128 TiB of address space? A problem that does not appear to exist in
>> reality?
>>
>> I suggested the personality approach as the least impactful compromise way
>> of this series working, but I think after what Arnd has said (and please
>> forgive me if I've missed further discussion have been dipping in and out
>> of this!) - adapting risc v to the approach we take elsewhere seems the
>> most sensible solution to me.

There's one wrinkle here: RISC-V started out with 39-bit VAs by default,
and we've had at least one report of userspace breaking when moving to
48-bit addresses. That was just address sanitizer, so maybe nobody
cares, but we're still pretty early in the transition to 48-bit systems
(most of the HW is still 39-bit) so it's not clear if that's going to be
the only bug.

So we're sort of in our own world of backwards compatibility here.
39-bit vs 48-bit is just an arbitrary number, but "38 bits are enough
for userspace" doesn't seem as sane a "47 bits are enough for
userspace". Maybe the right answer here is to just say the 38-bit
userspace is broken and that it's a Linux-ism that 64-bit sytems have
47-bit user addresses by default.

>> This remains something we can revisit in future if this turns out to be
>> egregious.
>>
>
> I appreciate Arnd's comments, but I do not think that making 47-bit the
> default is the best solution for riscv. On riscv, support for 48-bit
> address spaces was merged in 5.17 and support for 57-bit address spaces
> was merged in 5.18 without changing the default addresses provided by
> mmap(). It could be argued that this was a mistake, however since at the
> time there didn't exist hardware with larger address spaces it wasn't an
> issue. The applications that existed at the time that relied on the
> smaller address spaces have not been able to move to larger address
> spaces. Making a 47-bit user-space address space default solves the
> problem, but that is not arch agnostic, and can't be since of the
> varying differences in page table sizes across architectures, which is
> the other part of the problem I am trying to solve.
>
>> >
>> > Opting-in to the higher address space is reasonable. However, it is not
>> > my preference, because the purpose of this flag is to ensure that
>> > allocations do not exceed 47-bits, so it is a clearer ABI to have the
>> > applications that want this guarantee to be the ones setting the flag,
>> > rather than the applications that want the higher bits setting the flag.
>>
>> Perfect is the enemy of the good :) and an idealised solution may not end
>> up being something everybody can agree on.
>
> Yes you are totally right! Although this is not my ideal solution, it
> sufficiently accomplishes the goal so I think it is reasonable to
> implement this as a personality flag.
>
>>
>> >
>> > - Charlie
>> >
>> > >
>> > > --
>> > > Catalin
>> >
>> >
>> >