Re: [PATCH net -v2] [BUGFIX] bonding: use flush_delayed_work_syncin bond_close

From: Stephen Hemminger
Date: Wed Oct 19 2011 - 14:41:35 EST


On Wed, 19 Oct 2011 11:01:02 -0700
Jay Vosburgh <fubar@xxxxxxxxxx> wrote:

> Mitsuo Hayasaka <mitsuo.hayasaka.hu@xxxxxxxxxxx> wrote:
>
> >The bond_close() calls cancel_delayed_work() to cancel delayed works.
> >It, however, cannot cancel works that were already queued in workqueue.
> >The bond_open() initializes work->data, and proccess_one_work() refers
> >get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when
> >work->data has been initialized. Thus, a panic occurs.
> >
> >This patch uses flush_delayed_work_sync() instead of cancel_delayed_work()
> >in bond_close(). It cancels delayed timer and waits for work to finish
> >execution. So, it can avoid the null pointer dereference due to the
> >parallel executions of proccess_one_work() and initializing proccess
> >of bond_open().
>
> I'm setting up to test this. I have a dim recollection that we
> tried this some years ago, and there was a different deadlock that
> manifested through the flush path. Perhaps changes since then have
> removed that problem.
>
> -J

Won't this deadlock on RTNL. The problem is that:

CPU0 CPU1
rtnl_lock
bond_close
delayed_work
mii_work
read_lock(bond->lock);
read_unlock(bond->lock);
rtnl_lock... waiting for CPU0
flush_delayed_work_sync
waiting for delayed_work to finish...


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/