Re: Fw: [PATCH -mm] workqueue: debug possible endless loop in cancel_rearming_delayed_work
From: Oleg Nesterov
Date: Fri Apr 27 2007 - 03:53:14 EST
On 04/27, Jarek Poplawski wrote:
>
> On Thu, Apr 26, 2007 at 08:34:06PM +0400, Oleg Nesterov wrote:
> > >
> > > > else if (test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work)))
> > > > done = del_timer(&dwork->timer)
> > >
> > [...snip...]
> > > It is something alike to the current
> > > way, with some added measures: you try to shoot a work on the run,
> > > while queued or timer_pending, plus the _PENDING flag set, so it seems,
> > > there is some risk of longer than planed looping.
> >
> > Sorry, can't understand. done == 0 means that the queueing in progress,
> > this work should be placed on cwq->worklist very soon, most probably
> > right after we drop cwq->lock.
>
> I think, theoretically, probably, maybe, there is possible some strange
> case, this function gets spin_lock only when: list_empty(&work->entry) == 1
> && _PENDING == 1 && del_timer(&dwork->timer) == 0.
Yes, but this is not so strange, this means the queueing in progress. Most
probably the "owner" of WORK_STRUCT_PENDING bit spins waiting for cwq->lock.
We will retry in this case. Of course, if we have a workqueue with the single
work which just re-arms itself via queue_work() (without delay) and does nothing
more, we may need a lot of looping.
> PS: probably unusable, but for my own satisfaction:
>
> Acked-by: Jarek Poplawski <jarkao2@xxxxx>
It is useable, at least for me. I hope you will re-ack when I actually send
the patch. Note that the "else" branch above doesn't need cwq->lock, and we
should start with del_timer(), because the pending timer is the most common
case.
Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/