Re: [syzbot] WARNING: refcount bug in __linkwatch_run_queue

From: Jakub Kicinski
Date: Wed Nov 17 2021 - 09:15:59 EST

On Wed, 17 Nov 2021 09:19:07 +0100 Willy Tarreau wrote:
> Thanks for the report. I'm seeing that linkwatch_do_dev() is also
> called in linkwatch_forget_dev(), and am wondering if we're not
> seeing a sequence like this one:
> linkwatch_forget_dev()
> list_del_init()
> linkwatch_do_dev()
> netdev_state_change()
> ... one of the notifiers
> ... linkwatch_add_event() => adds to watch list
> dev_put()
> ...
> __linkwatch_run_queue()
> linkwatch_do_dev()
> dev_put()
> => bang!
> Well, in theory, no, since linkwatch_add_event() will call dev_hold()
> when adding to the list, so we ought to leave the first call with a
> refcount still covering the list's presence, and I don't see how it
> can reach zero before reaching dev_put() in linkwatch_do_dev() as this
> function is only called when the event was picked from the list.
> The only difference I'm seeing is that before the patch, a call to
> linkwatch_forget_dev() on a non-present device would call dev_put()
> without going through dev_activate(), dev_deactivate(), nor
> netdev_state_change(), but I'm not seeing how that could make a
> difference. linkwatch_forget_dev() is called from netdev_wait_allrefs()
> which will wait for the refcnt to be exactly 1, thus even if we queue
> an extra event we cant leave that function until the event has been
> processed.

The ref leak could come from anywhere, tho. Like: