Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

From: Matthew Wilcox
Date: Wed Apr 25 2018 - 12:30:26 EST


On Wed, Apr 25, 2018 at 09:20:55AM -0700, Eric Dumazet wrote:
> On 04/25/2018 09:04 AM, Matthew Wilcox wrote:
> > If you don't zap the page range, any of the CPUs in the system where
> > any thread in this task have ever run may have a TLB entry pointing to
> > this page ... if the page is being recycled into the page allocator,
> > then that page might end up as a slab page or page table or page cache
> > while the other CPU still have access to it.
>
> Yes, this makes sense.
>
> >
> > You could hang onto the page until you've built up a sufficiently large
> > batch, then bulk-invalidate all of the TLB entries, but we start to get
> > into weirdnesses on different CPU architectures.
> >
>
> zap_page_range() is already doing a bulk-invalidate,
> so maybe vm_replace_page() wont bring serious improvement if we end-up doing same dance.

Sorry, I was unclear. zap_page_range() bulk-invalidates all pages that
were torn down as part of this call. What I was trying to say was that
we could have a whole new API which put page after page into the same
address, and bumped the refcount on them to prevent them from actually
being freed. Once we get to a batch limit, we invalidate all of the
pages which were mapped at those addresses and can then free the pages
back to the allocator.

I don't think you can implement this scheme on s390 because it requires
the userspace address to still be mapped to that page on shootdown
(?) but I think we could implement it on x86.

Another possibility is if we had some way to insert the TLB entry into
the local CPU's page tables only, we wouldn't need to broadcast-invalidate
the TLB entry; we could just do it locally which is relatively quick.