Re: timer code oops when calling mod_delayed_work

From: Jeff Layton
Date: Sat Oct 31 2015 - 07:34:52 EST


On Sat, 31 Oct 2015 11:00:12 +0900
Tejun Heo <tj@xxxxxxxxxx> wrote:

> (cc'ing Lai)
>
> Hello, Jeff.
>
> On Thu, Oct 29, 2015 at 01:58:36PM -0400, Jeff Layton wrote:
> > crash> p cache_cleaner
> > cache_cleaner = $12 = {
> > work = {
> > data = {
> > counter = 0xfffffffe1
>
> If I'm not mistaken, PENDING, flush color 14, OFFQ and POOL_NONE.
>
> > },
> > entry = {
> > next = 0xffffffffa03623c8 <cache_cleaner+8>,
> > prev = 0xffffffffa03623c8 <cache_cleaner+8>
>
> Empty entry.
>
> > },
> > func = 0xffffffffa03333c0 <cache_cleaner_func>
> > },
> > timer = {
> > entry = {
> > next = 0x0,
> > pprev = 0xffff88085fd0eaf8
> > },
> > expires = 0x100021e99,
> > function = 0xffffffff810b66a0 <delayed_work_timer_fn>,
> > data = 0xffffffffa03623c0,
> > flags = 0x200014,
> > slack = 0xffffffff,
> > start_pid = 0x0,
> > start_site = 0x0,
> > start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> > },
> > wq = 0xffff88085f48fc00,
> > cpu = 0x14
> > }
> >
> > So the PENDING bit is set (lowest bit in data.counter), and timer->entry.pprev
> > pprev pointer is not NULL (so timer_pending is true). I also see that
> > there are several nfsd threads running the shrinker at the same time.
> >
> > There is one potential problem that I see, but I'd appreciate someone
> > sanity checking me on this. Here is mod_delayed_work_on:
> ...
> > ...and here is the beginning of try_to_grab_pending:
> >
> > ------------------[snip]------------------------
> > /* try to steal the timer if it exists */
> > if (is_dwork) {
> > struct delayed_work *dwork = to_delayed_work(work);
> >
> > /*
> > * dwork->timer is irqsafe. If del_timer() fails, it's
> > * guaranteed that the timer is not queued anywhere and not
> > * running on the local CPU.
> > */
> > if (likely(del_timer(&dwork->timer)))
> > return 1;
> > }
> >
> > /* try to claim PENDING the normal way */
> > if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))
> > return 0;
> > ------------------[snip]------------------------
> >
> >
> > ...so if del_timer returns true, we'll return 1 from
> > try_to_grab_pending without actually setting the
> > WORK_STRUCT_PENDING_BIT, and then will end up calling
> > __queue_delayed_work.
> >
> > That seems wrong to me -- shouldn't we be ensuring that that bit is set
> > when returning 1 from try_to_grab_pending to guard against concurrent
> > queue_delayed_work_on calls?
>
> But if try_to_grab_pending() succeeded at stealing dwork->timer, it's
> known that the PENDING bit must already be set. IOW, the bit is
> stolen together with the timer.
>
> Heh, this one is tricky. Yeah, try_to_grab_pending() missing PENDING
> would explain the failure but I can't see how it'd leak at the moment.
>

Thanks Tejun. Yeah, I realized that after sending the response above.

If you successfully delete the timer the timer then the PENDING bit
should already be set. Might be worth throwing in something like this,
just before the return 1:

WARN_ON(!test_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))

...but I doubt it would fire. I think it's likely that the bug is
elsewhere.

The other thing is that we've had this code in place for a couple of
years now, and this is the first time I've seen an oops like this. I
suspect that this may be a recent regression, but I don't know that for
sure.

I have asked Chris and Michael to see if they can bisect it down, but
it may be a bit before they can get that done. Any insight you might
have in the meantime would helpful.
--
Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/