Re: [PATCH v6 5/6] mm: secretmem: use PMD-size pages to amortize direct map fragmentation
From: David Hildenbrand
Date: Wed Sep 30 2020 - 11:25:39 EST
On 30.09.20 17:17, James Bottomley wrote:
> On Wed, 2020-09-30 at 16:45 +0200, David Hildenbrand wrote:
>> On 30.09.20 16:39, James Bottomley wrote:
>>> On Wed, 2020-09-30 at 13:27 +0300, Mike Rapoport wrote:
>>>> On Tue, Sep 29, 2020 at 05:15:52PM +0200, Peter Zijlstra wrote:
>>>>> On Tue, Sep 29, 2020 at 05:58:13PM +0300, Mike Rapoport wrote:
>>>>>> On Tue, Sep 29, 2020 at 04:12:16PM +0200, Peter Zijlstra
>>>>>> wrote:
>>>>>>> It will drop them down to 4k pages. Given enough inodes,
>>>>>>> and allocating only a single sekrit page per pmd, we'll
>>>>>>> shatter the directmap into 4k.
>>>>>>
>>>>>> Why? Secretmem allocates PMD-size page per inode and uses it
>>>>>> as a pool of 4K pages for that inode. This way it ensures
>>>>>> that __kernel_map_pages() is always called on PMD boundaries.
>>>>>
>>>>> Oh, you unmap the 2m page upfront? I read it like you did the
>>>>> unmap at the sekrit page alloc, not the pool alloc side of
>>>>> things.
>>>>>
>>>>> Then yes, but then you're wasting gobs of memory. Basically you
>>>>> can pin 2M per inode while only accounting a single page.
>>>>
>>>> Right, quite like THP :)
>>>>
>>>> I considered using a global pool of 2M pages for secretmem and
>>>> handing 4K pages to each inode from that global pool. But I've
>>>> decided to waste memory in favor of simplicity.
>>>
>>> I can also add that the user space consumer of this we wrote does
>>> its user pool allocation at a 2M granularity, so nothing is
>>> actually wasted.
>>
>> ... for that specific user space consumer. (or am I missing
>> something?)
>
> I'm not sure I understand what you mean? It's designed to be either
> the standard wrapper or an example of how to do the standard wrapper
> for the syscall. It uses the same allocator system glibc uses for
> malloc/free ... which pretty much everyone uses instead of calling
> sys_brk directly. If you look at the granularity glibc uses for
> sys_brk, it's not 4k either.
Okay thanks, "the user space consumer of this we wrote" didn't sound as
generic to me as "the standard wrapper".
--
Thanks,
David / dhildenb