Re: [syzbot] BUG: corrupted list in netif_napi_add

From: Saeed Mahameed
Date: Mon Oct 18 2021 - 19:31:53 EST


On Mon, 2021-10-18 at 19:12 +0300, Vlad Buslov wrote:
> On Mon 18 Oct 2021 at 18:42, Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> > On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:
> > > We got a use-after-free with very similar trace [0] during
> > > nightly
> > > regression. The issue happens when ip link up/down state is
> > > flipped
> > > several times in loop and doesn't reproduce for me manually. The
> > > fact
> > > that it didn't reproduce for me after running test ten times
> > > suggests
> > > that it is either very hard to reproduce or that it is a result
> > > of some
> > > interaction between several tests in our suite.
> > >
> > > [0]:
> > >
> > > [ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
> > >  [ 3187.890694]
> > > =================================================================
> > > =
> > >  [ 3187.892518] BUG: KASAN: use-after-free in
> > > __list_add_valid+0xc3/0xf0
> > >  [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task
> > > ip/119618
> >
> > Hm, not sure how similar it is. This one looks like channel was
> > freed
> > without deleting NAPI. Do you have list debug enabled?
>
> Yes, CONFIG_DEBUG_LIST is enabled.
>
do you have core dumps ?
let's enable kernel.panic_on_oops with core dumps and look at it next
time we see this, I really don't think mlx5 is leaking..