Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
From: Andy Lutomirski
Date: Thu Dec 03 2020 - 12:11:17 EST
> On Dec 3, 2020, at 4:06 AM, Topi Miettinen <toiwoton@xxxxxxxxx> wrote:
>
> On 3.12.2020 11.47, Florian Weimer wrote:
>> * Topi Miettinen:
>>> +3 Additionally enable full randomization of memory mappings created
>>> + with mmap(NULL, ...). With 2, the base of the VMA used for such
>>> + mappings is random, but the mappings are created in predictable
>>> + places within the VMA and in sequential order. With 3, new VMAs
>>> + are created to fully randomize the mappings.
>>> +
>>> + Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if
>>> + not necessary and the location of stack and vdso are also
>>> + randomized.
>>> +
>>> + On 32 bit systems this may cause problems due to increased VM
>>> + fragmentation if the address space gets crowded.
>> Isn't this a bit of an understatement? I think you'll have to restrict
>> this randomization to a subregion of the entire address space, otherwise
>> the reduction in maximum mapping size due to fragmentation will be a
>> problem on 64-bit architectures as well (which generally do not support
>> the full 64 bits for user-space addresses).
>
> Restricting randomization would reduce the address space layout randomization and make this less useful. There's 48 or 56 bits, which translate to 128TB and 64PB of VM for user applications. Is it really possible to build today (or in near future) a system, which would contain so much RAM that such fragmentation could realistically happen? Perhaps also in a special case where lots of 1GB huge pages are necessary? Maybe in those cases you shouldn't use randomize_va_space=3. Or perhaps there could be randomize_va_space=3 which does something, and randomize_va_space=4 for those who want maximum randomization.
If you want a 4GB allocation to succeed, you can only divide the address space into 32k fragments. Or, a little more precisely, if you want a randomly selected 4GB region to be empty, any other allocation has a 1/32k chance of being in the way. (Rough numbers — I’m ignoring effects of the beginning and end of the address space, and I’m ignoring the size of a potential conflicting allocation.). This sounds good, except that a program could easily make a whole bunch of tiny allocations that get merged in current kernels but wouldn’t with your scheme.
So maybe this is okay, but it’s not likely to be a good default.
>
>>> + On all systems, it will reduce performance and increase memory
>>> + usage due to less efficient use of page tables and inability to
>>> + merge adjacent VMAs with compatible attributes. In the worst case,
>>> + additional page table entries of up to 4 pages are created for
>>> + each mapping, so with small mappings there's considerable penalty.
>> The number 4 is architecture-specific, right?
>
> Yes, I only know x86_64. Actually it could have 5 level page tables. I'll fix this in next version.
>
> -Topi