Re: [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR
From: Andy Lutomirski
Date: Tue Jan 03 2017 - 13:33:36 EST
On Tue, Jan 3, 2017 at 5:18 AM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> On Monday, January 2, 2017 10:08:28 PM CET Andy Lutomirski wrote:
>>
>> > This seems to nicely address the same problem on arm64, which has
>> > run into the same issue due to the various page table formats
>> > that can currently be chosen at compile time.
>>
>> On further reflection, I think this has very little to do with paging
>> formats except insofar as paging formats make us notice the problem.
>> The issue is that user code wants to be able to assume an upper limit
>> on an address, and it gets an upper limit right now that depends on
>> architecture due to paging formats. But someone really might want to
>> write a *portable* 64-bit program that allocates memory with the high
>> 16 bits clear. So let's add such a mechanism directly.
>>
>> As a thought experiment, what if x86_64 simply never allocated "high"
>> (above 2^47-1) addresses unless a new mmap-with-explicit-limit syscall
>> were used? Old glibc would continue working. Old VMs would work.
>> New programs that want to use ginormous mappings would have to use the
>> new syscall. This would be totally stateless and would have no issues
>> with CRIU.
>
> I can see this working well for the 47-bit addressing default, but
> what about applications that actually rely on 39-bit addressing
> (I'd have to double-check, but I think this was the limit that
> people were most interested in for arm64)?
>
> 39 bits seems a little small to make that the default for everyone
> who doesn't pass the extra flag. Having to pass another flag to
> limit the addresses introduces other problems (e.g. mmap from
> library call that doesn't pass that flag).
That's a fair point. Maybe my straw man isn't so good.
>
>> If necessary, we could also have a prctl that changes a
>> "personality-like" limit that is in effect when the old mmap was used.
>> I say "personality-like" because it would reset under exactly the same
>> conditions that personality resets itself.
>
> For "personality-like", it would still have to interact
> with the existing PER_LINUX32 and PER_LINUX32_3GB flags that
> do the exact same thing, so actually using personality might
> be better.
>
> We still have a few bits in the personality arguments, and
> we could combine them with the existing ADDR_LIMIT_3GB
> and ADDR_LIMIT_32BIT flags that are mutually exclusive by
> definition, such as
>
> ADDR_LIMIT_32BIT = 0x0800000, /* existing */
> ADDR_LIMIT_3GB = 0x8000000, /* existing */
> ADDR_LIMIT_39BIT = 0x0010000, /* next free bit */
> ADDR_LIMIT_42BIT = 0x8010000,
> ADDR_LIMIT_47BIT = 0x0810000,
> ADDR_LIMIT_48BIT = 0x8810000,
>
> This would probably take only one or two personality bits for the
> limits that are interesting in practice.
Hmm. What if we approached this a bit differently? We could add a
single new personality bit ADDR_LIMIT_EXPLICIT. Setting this bit
cause PER_LINUX32_3GB etc to be automatically cleared. When
ADDR_LIMIT_EXPLICIT is in effect, prctl can set a 64-bit numeric
limit. If ADDR_LIMIT_EXPLICIT is cleared, the prctl value stops being
settable and reading it via prctl returns whatever is implied by the
other personality bits.
--Andy