Re: [igb] netconsole triggers warning in netpoll_poll_dev

From: Alexander Duyck
Date: Thu May 06 2021 - 20:38:34 EST


On Thu, May 6, 2021 at 4:32 PM Jesse Brandeburg
<jesse.brandeburg@xxxxxxxxx> wrote:
>
> Alexander Duyck wrote:
>
> > On Sun, Apr 25, 2021 at 11:47 PM Oleksandr Natalenko
> > <oleksandr@xxxxxxxxxxxxxx> wrote:
> > >
> > > Hello.
> > >
> > > On Fri, Apr 23, 2021 at 03:58:36PM -0700, Jakub Kicinski wrote:
> > > > On Fri, 23 Apr 2021 10:19:44 +0200 Oleksandr Natalenko wrote:
> > > > > On Wed, Apr 07, 2021 at 04:06:29PM -0700, Alexander Duyck wrote:
> > > > > > On Wed, Apr 7, 2021 at 11:07 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> > > > > > > Sure, that's simplest. I wasn't sure something is supposed to prevent
> > > > > > > this condition or if it's okay to cover it up.
> > > > > >
> > > > > > I'm pretty sure it is okay to cover it up. In this case the "budget -
> > > > > > 1" is supposed to be the upper limit on what can be reported. I think
> > > > > > it was assuming an unsigned value anyway.
> > > > > >
> > > > > > Another alternative would be to default clean_complete to !!budget.
> > > > > > Then if budget is 0 clean_complete would always return false.
> > > > >
> > > > > So, among all the variants, which one to try? Or there was a separate
> > > > > patch sent to address this?
> > > >
> > > > Alex's suggestion is probably best.
> > > >
> > > > I'm not aware of the fix being posted. Perhaps you could take over and
> > > > post the patch if Intel doesn't chime in?
> > >
> > > So, IIUC, Alex suggests this:
> > >
> > > ```
> > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> > > index a45cd2b416c8..7503d5bf168a 100644
> > > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > > @@ -7981,7 +7981,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
> > > struct igb_q_vector,
> > > napi);
> > > bool clean_complete = true;
> > > - int work_done = 0;
> > > + unsigned int work_done = 0;
> > >
> > > #ifdef CONFIG_IGB_DCA
> > > if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
> > > @@ -8008,7 +8008,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
> > > if (likely(napi_complete_done(napi, work_done)))
> > > igb_ring_irq_enable(q_vector);
> > >
> > > - return min(work_done, budget - 1);
> > > + return min_t(unsigned int, work_done, budget - 1);
> > > }
> > >
> > > /**
> > > ```
> > >
> > > Am I right?
> > >
> > > Thanks.
> >
> > Actually a better way to go would be to probably just initialize
> > "clean_complete = !!budget". With that we don't have it messing with
> > the interrupt enables which would probably be a better behavior.
>
>
> Thanks guys for the suggestions here! Finally got some time for
> this, so here is the patch I'm going to queue shortly.
>
> From ffd24e90d688ee347ab051266bfc7fca00324a68 Mon Sep 17 00:00:00 2001
> From: Jesse Brandeburg <jesse.brandeburg@xxxxxxxxx>
> Date: Thu, 6 May 2021 14:41:11 -0700
> Subject: [PATCH net] igb: fix netpoll exit with traffic
> To: netdev,
> Oleksandr Natalenko <oleksandr@xxxxxxxxxxxxxx>
> Cc: Jakub Kicinski <kuba@xxxxxxxxxx>, LKML <linux-kernel@xxxxxxxxxxxxxxx>, "Brandeburg, Jesse" <jesse.brandeburg@xxxxxxxxx>, "Nguyen, Anthony L" <anthony.l.nguyen@xxxxxxxxx>, "David S. Miller" <davem@xxxxxxxxxxxxx>, intel-wired-lan <intel-wired-lan@xxxxxxxxxxxxxxxx>, Alexander Duyck <alexander.duyck@xxxxxxxxx>
>
> Oleksandr brought a bug report where netpoll causes trace messages in
> the log on igb.
>
> [22038.710800] ------------[ cut here ]------------
> [22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
> [22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0
>
> After some discussion and debug from the list, it was deemed that the
> right thing to do is initialize the clean_complete variable to false
> when the "netpoll mode" of passing a zero budget is used.
>
> This logic should be sane and not risky because the only time budget
> should be zero on entry is netpoll. Change includes a small refactor
> of local variable assignments to clean up the look.
>
> Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead and improve performance")
> Reported-by: Oleksandr Natalenko <oleksandr@xxxxxxxxxxxxxx>
> Suggested-by: Alexander Duyck <alexander.duyck@xxxxxxxxx>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@xxxxxxxxx>
> ---
>
> Compile tested ONLY, but functionally it should be exactly the same for
> all cases except when budget is zero on entry, which will hopefully fix
> the bug.
> ---
> drivers/net/ethernet/intel/igb/igb_main.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 0cd37ad81b4e..b0a9bed14071 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -7991,12 +7991,16 @@ static void igb_ring_irq_enable(struct igb_q_vector *q_vector)
> **/
> static int igb_poll(struct napi_struct *napi, int budget)
> {
> - struct igb_q_vector *q_vector = container_of(napi,
> - struct igb_q_vector,
> - napi);
> - bool clean_complete = true;
> + struct igb_q_vector *q_vector;
> + bool clean_complete;
> int work_done = 0;
>
> + /* if budget is zero, we have a special case for netconsole, so
> + * make sure to set clean_complete to false in that case.
> + */
> + clean_complete = !!budget;
> +
> + q_vector = container_of(napi, struct igb_q_vector, napi);
> #ifdef CONFIG_IGB_DCA
> if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
> igb_update_dca(q_vector);

I'm not a big fan of moving the q_vector init as a part of this patch
since it just means more backport work.

That said the change itself should be harmless so I am good with it either way.

Reviewed-by: Alexander Duyck <alexanderduyck@xxxxxx>