Re: [PATCH -mm] workqueue: debug possible endless loop in cancel_rearming_delayed_work

From: David Chinner
Date: Fri Apr 20 2007 - 04:54:37 EST


On Fri, Apr 20, 2007 at 10:13:26AM +0200, Jarek Poplawski wrote:
> On Fri, Apr 20, 2007 at 12:46:18AM +1000, David Chinner wrote:
> > On Thu, Apr 19, 2007 at 08:54:04AM +0200, Jarek Poplawski wrote:
> > > Hi,
> > >
> > > IMHO cancel_rearming_delayed_work is dangerous place:
> >
> > Agreed - I spent a couple of hours today learning why it
> > can only be used on work functions that always rearm...
> >
> > > - it assumes a work function always rearms (with no exception),
> > > which probably isn't explained enough now (but anyway should
> > > be checked in such loops);
> > >
> > > - probably possible (theoretical) scenario: a few work
> > > functions rearm themselves with very short, equal times;
> > > before flush_workqueue ends, their timers are already
> > > fired, so cancel_delayed_work has nothing to do.
> >
> > Easier than that - have a work function that rearms only if there's
> > more work to do in the future. You only arm the timer when you
> > have work to do, and it only rearms if there's more work to
> > do in the future (e.g. rotating expiry lists).
> >
> > i.e. while there's more work to do, you need to call
> > cancel_rearming_delayed_work() to stop it reliably, but if you race
> > with the work function not restarting itself, you hang.....
>
> I'm not sure I correctly get your point, but according to
> this comment:
>
> " * cancel_rearming_delayed_work - reliably kill off a delayed
> keventd work whose handler rearms the delayed work."
>
> there is a question, whether a function that "rearms only if"
> - "rearms".

Right. Given that the bug I was initially trying to solve was a race
killing off a handler that was rearming itself, that comment says to
me "this is the right thing to do".

> It seems the author of this comment didn't think
> so and it was obvious to him/her cancel_rearming_delayed_work
> wasn't intended for this case. At first I thought it's only a
> language question - now, I see it's probably logical, too.

Yes, after spending another two hours working out why my fix was
then hanging in cancel_rearming_delayed_work() I was a little bit
annoyed at the now obviously misleading comment. Five minutes later
I'd fixed the bug properly. A better comment would have saved me two
hours of wasted time.....

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/