Re: [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR

From: Andy Lutomirski
Date: Wed Jan 11 2017 - 14:21:06 EST


On Wed, Jan 11, 2017 at 10:49 AM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
> On 01/11/2017 10:37 AM, Kirill A. Shutemov wrote:
>>> How about preventing the max addr from being changed to too high a
>>> value while MPX is on instead of overriding the set value? This would
>>> have the added benefit that it would prevent silent failures where you
>>> think you've enabled large addresses but MPX is also on and mmap
>>> refuses to return large addresses.
>> Setting rlimit high doesn't mean that you necessary will get access to
>> full address space, even without MPX in picture. TASK_SIZE limits the
>> available address space too.
>
> OK, sure... If you want to take another mechanism into account with
> respect to MPX, we can do that. We'd just need to change every
> mechanism we want to support to ensure that it can't transition in ways
> that break MPX.
>
> What are you arguing here, though? Since we *might* be limited by
> something else that we should not care about controlling the rlimit?
>
>> I think it's consistent with other resources in rlimit: setting RLIMIT_RSS
>> to unlimited doesn't really means you are not subject to other resource
>> management.
>
> The farther we get into this, the more and more I think using an rlimit
> is a horrible idea. Its semantics aren't a great match, and you seem to
> be resistant to making *this* rlimit differ from the others when there's
> an entirely need to do so. We're already being bitten by "legacy"
> rlimit. IOW, being consistent with *other* rlimit behavior buys us
> nothing, only complexity.

Taking a step back, I think it would be fantastic if we could find a
way to make this work without any inheritable settings at all.
Perhaps we could have a per-mm value that is initialized to 2^47-1 on
execve() and can be raised by ELF note or by prctl()? Getting it
right for 32-bit would require a bit of thought. The ELF note would
make a high stack possible and, without the ELF note, we'd get a low
stack but high mmap(). Then the messy bits can be glibc's problem and
a toolchain problem as it should be, given that the only reason we
need a limit at all is because of messy userspace code.

Sure, the low stack prevents the *whole* address space from being used
in one big block for databases, but 2^57 - 2^47 ought to be good
enough.

I'm not 100% sure this is workable but, if it is, it makes everyone's
life easier. There's no need to muck around with setarch(1) or
similar hacks.