Re: [RFC PATCH] net: Fix one page_pool page leak from skb_frag_unref

From: Jakub Kicinski
Date: Mon Apr 29 2024 - 11:00:32 EST


On Fri, 26 Apr 2024 21:24:09 -0700 Mina Almasry wrote:
> On Fri, Apr 26, 2024 at 4:09 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> >
> > On Thu, 25 Apr 2024 12:20:59 -0700 Mina Almasry wrote:
> > > - if (recycle && napi_pp_get_page(page))
> > > + if (napi_pp_get_page(page))
> >
> > Pretty sure you can't do that. The "recycle" here is a concurrency
> > guarantee. A guarantee someone is holding a pp ref on that page,
> > a ref which will not go away while napi_pp_get_page() is executing.
>
> I don't mean to argue, but I think the get_page()/put_page() pair we
> do in the page ref path is susceptible to the same issue. AFAIU it's
> not safe to get_page() if another CPU can be dropping the last ref,
> get_page_unless_zero() should be used instead.

Whoever gave us the pointer to operate on has a reference, so the page
can't disappear. get_page() is safe. The problem with pp is that we
don't know whether the caller has a pp ref or a page ref. IOW the pp
ref may not be owned by whoever called us.

I guess the situation may actually be worse and we can only pp-ref a
page if both "source" and "destination" skb has pp_recycle = 1 :S

> Since get_page() is good in the page ref path without some guarantee,
> it's not obvious to me why we need this guarantee in the pp ref path,
> but I could be missing some subtlety. At any rate, if you prefer us
> going down the road of reverting commit 2cc3aeb5eccc ("skbuff: Fix a
> potential race while recycling page_pool packets"), I think that could
> also fix the issue.