Re: [syzbot] KASAN: use-after-free Read in addr_handler (4)

From: Jason Gunthorpe
Date: Thu Sep 16 2021 - 09:05:04 EST


On Thu, Sep 16, 2021 at 09:43:19AM +0200, Dmitry Vyukov wrote:
> On Wed, 15 Sept 2021 at 21:36, Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> >
> > On Wed, Sep 15, 2021 at 05:41:22AM -0700, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: 926de8c4326c Merge tag 'acpi-5.15-rc1-3' of git://git.kern..
> > > git tree: upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=11fd67ed300000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=37df9ef5660a8387
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=dc3dfba010d7671e05f5
> > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+dc3dfba010d7671e05f5@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > #syz dup: KASAN: use-after-free Write in addr_resolve (2)
> >
> > Frankly, I still can't figure out how this is happening
> >
> > RDMA_USER_CM_CMD_RESOLVE_IP triggers a background work and
> > RDMA_USER_CM_CMD_DESTROY_ID triggers destruction of the memory the
> > work touches.
> >
> > rdma_addr_cancel() is supposed to ensure that the work isn't and won't
> > run.
> >
> > So to hit this we have to either not call rdma_addr_cancel() when it
> > is need, or rdma_addr_cancel() has to be broken and continue to allow
> > the work.
> >
> > I could find nothing along either path, though rdma_addr_cancel()
> > relies on some complicated properties of the workqueues I'm not
> > entirely positive about.
>
> I stared at the code, but it's too complex to grasp it all entirely.
> There are definitely lots of tricky concurrent state transitions and
> potential for unexpected interleavings. My bet would be on some tricky
> hard-to-trigger thread interleaving.

>From a uapi perspective the entire thing is serialized with a mutex..

> The only thing I can think of is adding more WARNINGs to the code to
> check more of these assumptions. But I don't know if there are any
> useful testable assumptions...

Do you have any idea why we can't get a reproduction out of syzkaller
here?

I feel less comfortable with syzkaller's debug output, can you give
some idea what it might be doing concurrently?

Jason