Re: [RFC][PATCH 2/3] dma-buf: system_heap: Add pagepool support to system heap

From: John Stultz
Date: Fri Jan 22 2021 - 20:29:37 EST

On Mon, Dec 21, 2020 at 2:09 PM Daniel Vetter <daniel@xxxxxxxx> wrote:
> On Fri, Dec 18, 2020 at 05:16:56PM -0800, John Stultz wrote:
> > On Fri, Dec 18, 2020 at 6:36 AM Daniel Vetter <daniel@xxxxxxxx> wrote:
> > > On Thu, Dec 17, 2020 at 11:06:11PM +0000, John Stultz wrote:
> > > > Reuse/abuse the pagepool code from the network code to speed
> > > > up allocation performance.
> > > >
> > > > This is similar to the ION pagepool usage, but tries to
> > > > utilize generic code instead of a custom implementation.
> > >
> > > We also have one of these in ttm. I think we should have at most one of
> > > these for the gpu ecosystem overall, maybe as a helper that can be plugged
> > > into all the places.
> > >
> > > Or I'm kinda missing something, which could be since I only glanced at
> > > yours for a bit. But it's also called page pool for buffer allocations,
> > > and I don't think there's that many ways to implement that really :-)
> >
> > Yea, when I was looking around the ttm one didn't seem quite as
> > generic as the networking one, which more easily fit in here.
> Oops, I didn't look that closely and didn't realize you're reusing the one
> from net/core/.
> > The main benefit for the system heap is not so much the pool itself
> > (the normal page allocator is pretty good), as it being able to defer
> > the free and zero the pages in a background thread, so the pool is
> > effectively filled with pre-zeroed pages.
> >
> > But I'll take another look at the ttm implementation and see if it can
> > be re-used or the shared code refactored and pulled out somehow.
> I think moving the page_pool from net into lib and using it in ttm might
> also be an option. Lack of shrinker in the networking one might be a bit a
> problem.

Yea. I've been looking at this, to see how to abstract out a generic
pool implementation, but each pool implementation has been tweaked for
the specific use cases, so a general abstraction is a bit tough right

For example the ttm pool's handling allocations both from alloc_pages
and dma_alloc in a pool, where the net page pool only uses alloc_pages
(but can pre map via dma_map_attr).

And as you mentioned, the networking page pool is statically sized
where the ttm pool is dynamic and shrinker controlled.

Further, as the ttm pool is utilized for keeping pools of pages set
for specific cache types, it makes it difficult to abstract that out
as we have to be able to reset the caching (set_pages_wb()) when
shrinking, so that would also have to be pushed down into the pool
attributes as well.

So far, in my attempts to share an abstraction for both the net
page_pool and the ttm page pool, it seems to make the code complexity
worse on both sides - so while I'm interested in continuing to try to
find a way to share code here, I'm not sure it makes sense to hold up
this series (which is already re-using an existing implementation and
provide a performance bump in microbenchmarks) for the
grand-unified-page-pool. Efforts to refactor the ttm pool and net page
pool can continue on indepently, and I'd be happy to move the system
heap to whatever that ends up being.