Re: [PATCH wq/for-4.0-fixes v2] workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE

From: Tejun Heo
Date: Thu Mar 05 2015 - 06:47:03 EST


On Thu, Mar 05, 2015 at 04:36:38AM -0500, Tejun Heo wrote:
> On Thu, Mar 05, 2015 at 10:24:50AM +0100, Tomeu Vizoso wrote:
> ...
> > [ 317.251001] PC is at bit_waitqueue+0x38/0x6c
> ...
> > [ 317.420658] [<c028fe18>] (bit_waitqueue) from [<c0270d34>]
> > (__cancel_work_timer+0x28/0x1b0)
> > [ 317.430598] [<c0270d34>] (__cancel_work_timer) from [<c0270ed8>]
> > (cancel_work_sync+0x1c/0x20)
> > [ 317.440672] [<c0270ed8>] (cancel_work_sync) from [<bf0ed138>]
> > (regulatory_exit+0x24/0x17c [cfg80211])
> > [ 317.451396] [<bf0ed138>] (regulatory_exit [cfg80211]) from
> > [<bf125184>] (cfg80211_exit+0x38/0x4c [cfg80211])
> > [ 317.462726] [<bf125184>] (cfg80211_exit [cfg80211]) from
> > [<c02c8b4c>] (SyS_delete_module+0x1b4/0x1f8)
> > [ 317.473411] [<c02c8b4c>] (SyS_delete_module) from [<c0210a00>]
> > (ret_fast_syscall+0x0/0x34)
>
> Ah, I think that's from feeding static address to virt_to_page. :(
> Reverted the patch from the branch. Will think more about what to do.

So, it's from feeding a static address of a module which is allocated
on the vmalloc space to bit_waitqueue() which then tries to find out
the backing page struct which vmalloc area obviously doesn't have.
Currently testing an alternative patch which uses a single waitqueue
w/ a custom wakeup function which can filter the target work item.
Will soon post the new version.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/