Re: [bug] sound: INFO: possible recursive locking detected

From: Takashi Iwai
Date: Thu Jan 08 2009 - 11:17:20 EST


At Thu, 08 Jan 2009 17:12:54 +0100,
Peter Zijlstra wrote:
>
> On Thu, 2009-01-08 at 17:10 +0100, Takashi Iwai wrote:
> > At Thu, 08 Jan 2009 16:39:31 +0100,
> > I wrote:
> > >
> > > At Thu, 8 Jan 2009 16:20:37 +0100,
> > > Ingo Molnar wrote:
> > > >
> > > > FYI, -tip testing found this new lockdep workqueue locking assert
> > > > triggered by the Intel HDA sound driver:
> > (snip)
> > > > [ 30.718689] =============================================
> > > > [ 30.718689] [ INFO: possible recursive locking detected ]
> > > > [ 30.718689] 2.6.28-tip #2
> > > > [ 30.718689] ---------------------------------------------
> > > > [ 30.718689] events/0/7 is trying to acquire lock:
> > > > [ 30.718689] (events){--..}, at: [<ffffffff80281050>] flush_workqueue+0x0/0x90
> > > > [ 30.718689]
> > > > [ 30.718689] but task is already holding lock:
> > > > [ 30.718689] (events){--..}, at: [<ffffffff8028076a>] run_workqueue+0x14a/0x240
> > > > [ 30.718689]
> > > > [ 30.718689] other info that might help us debug this:
> > > > [ 30.718689] 2 locks held by events/0/7:
> > > > [ 30.718689] #0: (events){--..}, at: [<ffffffff8028076a>] run_workqueue+0x14a/0x240
> > > > [ 30.718689] #1: (&wfc.work){--..}, at: [<ffffffff8028076a>] run_workqueue+0x14a/0x240
> > > > [ 30.718689]
> > > > [ 30.718689] stack backtrace:
> > > > [ 30.718689] Pid: 7, comm: events/0 Not tainted 2.6.28-tip #2
> > > > [ 30.718689] Call Trace:
> > > > [ 30.718689] [<ffffffff80295836>] validate_chain+0xc56/0x1340
> > > > [ 30.718689] [<ffffffff802974e6>] __lock_acquire+0x376/0x610
> > > > [ 30.718689] [<ffffffff80297819>] lock_acquire+0x99/0xd0
> > > > [ 30.718689] [<ffffffff80281050>] ? flush_workqueue+0x0/0x90
> > > > [ 30.718689] [<ffffffff80281091>] flush_workqueue+0x41/0x90
> > > > [ 30.718689] [<ffffffff80281050>] ? flush_workqueue+0x0/0x90
> > > > [ 30.718689] [<ffffffff802810f5>] flush_scheduled_work+0x15/0x20
> > > > [ 30.718689] [<ffffffff80c59794>] azx_clear_irq_pending+0x54/0x60
> > > > [ 30.718689] [<ffffffff80c599ec>] azx_free+0x10c/0x160
> > > > [ 30.718689] [<ffffffff80bae9ad>] ? snd_device_free+0x8d/0x100
> > > > [ 30.718689] [<ffffffff80c59a52>] azx_dev_free+0x12/0x20
> > > > [ 30.718689] [<ffffffff80bae9a1>] snd_device_free+0x81/0x100
> > > > [ 30.718689] [<ffffffff80baea89>] snd_device_free_all+0x69/0xa0
> > > > [ 30.718689] [<ffffffff80ba8a70>] snd_card_do_free+0x50/0xd0
> > > > [ 30.718689] [<ffffffff80ba9602>] snd_card_free+0xa2/0xc0
> > > > [ 30.718689] [<ffffffff8061115f>] ? __delay+0xf/0x20
> > > > [ 30.718689] [<ffffffff80edbb6a>] azx_probe+0x38a/0x9f0
> > > > [ 30.718689] [<ffffffff80c5a240>] ? azx_send_cmd+0x0/0x130
> > > > [ 30.718689] [<ffffffff80c5a370>] ? azx_get_response+0x0/0x210
> > > > [ 30.718689] [<ffffffff80c59b70>] ? azx_attach_pcm_stream+0x0/0x1b0
> > > > [ 30.718689] [<ffffffff802804d0>] ? do_work_for_cpu+0x0/0x30
> > > > [ 30.718689] [<ffffffff80634807>] local_pci_probe+0x17/0x20
> > > > [ 30.718689] [<ffffffff802804e8>] do_work_for_cpu+0x18/0x30
> > > > [ 30.718689] [<ffffffff802804d0>] ? do_work_for_cpu+0x0/0x30
> > > > [ 30.718689] [<ffffffff802807bb>] run_workqueue+0x19b/0x240
> > > > [ 30.718689] [<ffffffff8028076a>] ? run_workqueue+0x14a/0x240
> > > > [ 30.718689] [<ffffffff802809cf>] worker_thread+0xaf/0x110
> > > > [ 30.718689] [<ffffffff80284e20>] ? autoremove_wake_function+0x0/0x40
> > > > [ 30.718689] [<ffffffff80280920>] ? worker_thread+0x0/0x110
> > > > [ 30.718689] [<ffffffff802848e3>] kthread+0x53/0x80
> > > > [ 30.718689] [<ffffffff8022c3aa>] child_rip+0xa/0x20
> > > > [ 30.718689] [<ffffffff8022bcbe>] ? restore_args+0x0/0x30
> > > > [ 30.718689] [<ffffffff80284890>] ? kthread+0x0/0x80
> > > > [ 30.718689] [<ffffffff8022c3a0>] ? child_rip+0x0/0x20
> >
> > The backtrace implies that the probe callback is called in a
> > workqueue. Is it right?
>
> Yes, it appears a worklet tries to flush the queue he's on.

It calls flush_schedule_work().
So, now this should be avoided during the probe callback?

If so, a simple workaround would be to create its own workqueue, or
add a flag to avoid this call at destructor...


thanks,

Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/