Re: [PATCH v6 5/6] mm: secretmem: use PMD-size pages to amortize direct map fragmentation
From: James Bottomley
Date: Wed Sep 30 2020 - 11:18:07 EST
On Wed, 2020-09-30 at 16:45 +0200, David Hildenbrand wrote:
> On 30.09.20 16:39, James Bottomley wrote:
> > On Wed, 2020-09-30 at 13:27 +0300, Mike Rapoport wrote:
> > > On Tue, Sep 29, 2020 at 05:15:52PM +0200, Peter Zijlstra wrote:
> > > > On Tue, Sep 29, 2020 at 05:58:13PM +0300, Mike Rapoport wrote:
> > > > > On Tue, Sep 29, 2020 at 04:12:16PM +0200, Peter Zijlstra
> > > > > wrote:
> > > > > > It will drop them down to 4k pages. Given enough inodes,
> > > > > > and allocating only a single sekrit page per pmd, we'll
> > > > > > shatter the directmap into 4k.
> > > > >
> > > > > Why? Secretmem allocates PMD-size page per inode and uses it
> > > > > as a pool of 4K pages for that inode. This way it ensures
> > > > > that __kernel_map_pages() is always called on PMD boundaries.
> > > >
> > > > Oh, you unmap the 2m page upfront? I read it like you did the
> > > > unmap at the sekrit page alloc, not the pool alloc side of
> > > > things.
> > > >
> > > > Then yes, but then you're wasting gobs of memory. Basically you
> > > > can pin 2M per inode while only accounting a single page.
> > >
> > > Right, quite like THP :)
> > >
> > > I considered using a global pool of 2M pages for secretmem and
> > > handing 4K pages to each inode from that global pool. But I've
> > > decided to waste memory in favor of simplicity.
> >
> > I can also add that the user space consumer of this we wrote does
> > its user pool allocation at a 2M granularity, so nothing is
> > actually wasted.
>
> ... for that specific user space consumer. (or am I missing
> something?)
I'm not sure I understand what you mean? It's designed to be either
the standard wrapper or an example of how to do the standard wrapper
for the syscall. It uses the same allocator system glibc uses for
malloc/free ... which pretty much everyone uses instead of calling
sys_brk directly. If you look at the granularity glibc uses for
sys_brk, it's not 4k either.
James