Re: 2.6.21-mm1 hwsusp: BUG at workqueue.c:106

From: Oleg Nesterov
Date: Tue May 08 2007 - 08:41:10 EST


On 05/08, Jarek Poplawski wrote:
>
> On 08-05-2007 12:55, Oleg Nesterov wrote:
> > On 05/08, Andrew Morton wrote:
> >> On Tue, 08 May 2007 10:57:35 +0200 Jiri Slaby <jirislaby@xxxxxxxxx> wrote:
> >>
> >>> this occured in dmesg during resuming from hwsusp in 2.6.21-mm1 (captured
> >>> through netconsole). Perfectly reproducible, it simply happens each time I
> >>> try it.
> >> Let's cc Oleg.
> >>
> >>> usb_endpoint usbdev5.1_ep00: PM: resume from 0, parent usb5 still 2
> >>> ------------[ cut here ]------------
> >>> kernel BUG at /home/l/latest/xxx/kernel/workqueue.c:106!
> >>> invalid opcode: 0000 [#1]
> >>> SMP
> >>> Modules linked in: ipv6 floppy ohci1394 ieee1394 parport_pc parport usbhid
> >>> ehci_hcd pata_acpi ff_memless sr_mod cdrom
> ...
> > queue_delayed_work().
> >
> > Probably, cancel_delayed_work(&delayed_work->work) was called with the ->timer
> > pending. This is wrong, cancel_delayed_work() clears _PENDING unconditionally,
>
> Maybe I miss your point, but clearing is conditional: on timer delete...
>
> I think more suspicious is calling cancel_work_sync() for a delayed work
> (with timer pending). Or maybe some race profits from _PENDING cleared
> without locking?

Yes, of course, I meant cancel_work_sync(), sorry for the confusion.

Thanks!

So, once again, cancel_work_sync(&dwork->work) is wrong unless the timer
was stopped.

Before make-cancel_rearming_delayed_work-reliable.patch

it requires that the @work can't be re-queued.

After
works, but waits for the timer expiration in a busy-wait loop.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/