Re: [PATCH rdma-next v5 0/4] Dynamicaly allocate SG table from the pages

From: Jason Gunthorpe
Date: Tue Oct 06 2020 - 07:46:34 EST


On Tue, Oct 06, 2020 at 12:41:22PM +0200, Daniel Vetter wrote:
> On Mon, Oct 05, 2020 at 08:56:50PM -0300, Jason Gunthorpe wrote:
> > On Sun, Oct 04, 2020 at 06:43:36PM +0300, Leon Romanovsky wrote:
> > > This series extends __sg_alloc_table_from_pages to allow chaining of
> > > new pages to already initialized SG table.
> > >
> > > This allows for the drivers to utilize the optimization of merging contiguous
> > > pages without a need to pre allocate all the pages and hold them in
> > > a very large temporary buffer prior to the call to SG table initialization.
> > >
> > > The second patch changes the Infiniband driver to use the new API. It
> > > removes duplicate functionality from the code and benefits the
> > > optimization of allocating dynamic SG table from pages.
> > >
> > > In huge pages system of 2MB page size, without this change, the SG table
> > > would contain x512 SG entries.
> > > E.g. for 100GB memory registration:
> > >
> > > Number of entries Size
> > > Before 26214400 600.0MB
> > > After 51200 1.2MB
> > >
> > > Thanks
> > >
> > > Maor Gottlieb (2):
> > > lib/scatterlist: Add support in dynamic allocation of SG table from
> > > pages
> > > RDMA/umem: Move to allocate SG table from pages
> > >
> > > Tvrtko Ursulin (2):
> > > tools/testing/scatterlist: Rejuvenate bit-rotten test
> > > tools/testing/scatterlist: Show errors in human readable form
> >
> > This looks OK, I'm going to send it into linux-next on the hmm tree
> > for awhile to see if anything gets broken. If there is more
> > remarks/tags/etc please continue
>
> An idea that just crossed my mind: A pin_user_pages_sgt might be useful
> for both rdma and drm, since this would avoid the possible huge interim
> struct pages array for thp pages. Or anything else that could be coalesced
> down into a single sg entry.
>
> Not sure it's worth it, but would at least give a slightly neater
> interface I think.

We've talked about it. Christoph wants to see this area move to a biovec
interface instead of sgl, but it might still be worthwhile to have an
interm step at least as an API consolidation.

Avoiding the page list would be complicated as we'd somehow have to
code share the page table iterator scheme.

Jason