Re: [bug] sound: INFO: possible recursive locking detected

From: Takashi Iwai
Date: Thu Jan 08 2009 - 11:53:44 EST


At Thu, 08 Jan 2009 17:17:06 +0100,
I wrote:
>
> At Thu, 08 Jan 2009 17:12:54 +0100,
> Peter Zijlstra wrote:
> >
> > On Thu, 2009-01-08 at 17:10 +0100, Takashi Iwai wrote:
> > > At Thu, 08 Jan 2009 16:39:31 +0100,
> > > I wrote:
> > > >
> > > > At Thu, 8 Jan 2009 16:20:37 +0100,
> > > > Ingo Molnar wrote:
> > > > >
> > > > > FYI, -tip testing found this new lockdep workqueue locking assert
> > > > > triggered by the Intel HDA sound driver:
> > > (snip)
> > > > > [ 30.718689] =============================================
> > > > > [ 30.718689] [ INFO: possible recursive locking detected ]
> > > > > [ 30.718689] 2.6.28-tip #2
> > > > > [ 30.718689] ---------------------------------------------
> > > > > [ 30.718689] events/0/7 is trying to acquire lock:
> > > > > [ 30.718689] (events){--..}, at: [<ffffffff80281050>] flush_workqueue+0x0/0x90
> > > > > [ 30.718689]
> > > > > [ 30.718689] but task is already holding lock:
> > > > > [ 30.718689] (events){--..}, at: [<ffffffff8028076a>] run_workqueue+0x14a/0x240
> > > > > [ 30.718689]
> > > > > [ 30.718689] other info that might help us debug this:
> > > > > [ 30.718689] 2 locks held by events/0/7:
> > > > > [ 30.718689] #0: (events){--..}, at: [<ffffffff8028076a>] run_workqueue+0x14a/0x240
> > > > > [ 30.718689] #1: (&wfc.work){--..}, at: [<ffffffff8028076a>] run_workqueue+0x14a/0x240
> > > > > [ 30.718689]
> > > > > [ 30.718689] stack backtrace:
> > > > > [ 30.718689] Pid: 7, comm: events/0 Not tainted 2.6.28-tip #2
> > > > > [ 30.718689] Call Trace:
> > > > > [ 30.718689] [<ffffffff80295836>] validate_chain+0xc56/0x1340
> > > > > [ 30.718689] [<ffffffff802974e6>] __lock_acquire+0x376/0x610
> > > > > [ 30.718689] [<ffffffff80297819>] lock_acquire+0x99/0xd0
> > > > > [ 30.718689] [<ffffffff80281050>] ? flush_workqueue+0x0/0x90
> > > > > [ 30.718689] [<ffffffff80281091>] flush_workqueue+0x41/0x90
> > > > > [ 30.718689] [<ffffffff80281050>] ? flush_workqueue+0x0/0x90
> > > > > [ 30.718689] [<ffffffff802810f5>] flush_scheduled_work+0x15/0x20
> > > > > [ 30.718689] [<ffffffff80c59794>] azx_clear_irq_pending+0x54/0x60
> > > > > [ 30.718689] [<ffffffff80c599ec>] azx_free+0x10c/0x160
> > > > > [ 30.718689] [<ffffffff80bae9ad>] ? snd_device_free+0x8d/0x100
> > > > > [ 30.718689] [<ffffffff80c59a52>] azx_dev_free+0x12/0x20
> > > > > [ 30.718689] [<ffffffff80bae9a1>] snd_device_free+0x81/0x100
> > > > > [ 30.718689] [<ffffffff80baea89>] snd_device_free_all+0x69/0xa0
> > > > > [ 30.718689] [<ffffffff80ba8a70>] snd_card_do_free+0x50/0xd0
> > > > > [ 30.718689] [<ffffffff80ba9602>] snd_card_free+0xa2/0xc0
> > > > > [ 30.718689] [<ffffffff8061115f>] ? __delay+0xf/0x20
> > > > > [ 30.718689] [<ffffffff80edbb6a>] azx_probe+0x38a/0x9f0
> > > > > [ 30.718689] [<ffffffff80c5a240>] ? azx_send_cmd+0x0/0x130
> > > > > [ 30.718689] [<ffffffff80c5a370>] ? azx_get_response+0x0/0x210
> > > > > [ 30.718689] [<ffffffff80c59b70>] ? azx_attach_pcm_stream+0x0/0x1b0
> > > > > [ 30.718689] [<ffffffff802804d0>] ? do_work_for_cpu+0x0/0x30
> > > > > [ 30.718689] [<ffffffff80634807>] local_pci_probe+0x17/0x20
> > > > > [ 30.718689] [<ffffffff802804e8>] do_work_for_cpu+0x18/0x30
> > > > > [ 30.718689] [<ffffffff802804d0>] ? do_work_for_cpu+0x0/0x30
> > > > > [ 30.718689] [<ffffffff802807bb>] run_workqueue+0x19b/0x240
> > > > > [ 30.718689] [<ffffffff8028076a>] ? run_workqueue+0x14a/0x240
> > > > > [ 30.718689] [<ffffffff802809cf>] worker_thread+0xaf/0x110
> > > > > [ 30.718689] [<ffffffff80284e20>] ? autoremove_wake_function+0x0/0x40
> > > > > [ 30.718689] [<ffffffff80280920>] ? worker_thread+0x0/0x110
> > > > > [ 30.718689] [<ffffffff802848e3>] kthread+0x53/0x80
> > > > > [ 30.718689] [<ffffffff8022c3aa>] child_rip+0xa/0x20
> > > > > [ 30.718689] [<ffffffff8022bcbe>] ? restore_args+0x0/0x30
> > > > > [ 30.718689] [<ffffffff80284890>] ? kthread+0x0/0x80
> > > > > [ 30.718689] [<ffffffff8022c3a0>] ? child_rip+0x0/0x20
> > >
> > > The backtrace implies that the probe callback is called in a
> > > workqueue. Is it right?
> >
> > Yes, it appears a worklet tries to flush the queue he's on.
>
> It calls flush_schedule_work().
> So, now this should be avoided during the probe callback?
>
> If so, a simple workaround would be to create its own workqueue, or
> add a flag to avoid this call at destructor...

The below is a quick fix to convert to use the own workq in
hda_intel.c. This should fix this particular lockdep problem, at
least. (And it's better, anyway.)

Please give it a try.


thanks,

Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f04de11..7e09f98 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -406,6 +406,7 @@ struct azx {
unsigned int last_cmd; /* last issued command (to sync) */

/* for pending irqs */
+ struct workqueue_struct *workq;
struct work_struct irq_pending_work;

/* reboot notifier (for mysterious hangup problem at power-down) */
@@ -999,7 +1000,8 @@ static irqreturn_t azx_interrupt(int irq, void *dev_id)
} else {
/* bogus IRQ, process it later */
azx_dev->irq_pending = 1;
- schedule_work(&chip->irq_pending_work);
+ queue_work(chip->workq,
+ &chip->irq_pending_work);
}
}
}
@@ -1741,7 +1743,7 @@ static void azx_clear_irq_pending(struct azx *chip)
for (i = 0; i < chip->num_streams; i++)
chip->azx_dev[i].irq_pending = 0;
spin_unlock_irq(&chip->reg_lock);
- flush_scheduled_work();
+ flush_workqueue(chip->workq);
}

static struct snd_pcm_ops azx_pcm_ops = {
@@ -2019,6 +2021,8 @@ static int azx_free(struct azx *chip)
azx_stop_chip(chip);
}

+ if (chip->workq)
+ destroy_workqueue(chip->workq);
if (chip->irq >= 0)
free_irq(chip->irq, (void*)chip);
if (chip->msi)
@@ -2131,6 +2135,7 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci,
static struct snd_device_ops ops = {
.dev_free = azx_dev_free,
};
+ char qname[8];

*rchip = NULL;

@@ -2153,7 +2158,6 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci,
chip->driver_type = driver_type;
chip->msi = enable_msi;
chip->dev_index = dev;
- INIT_WORK(&chip->irq_pending_work, azx_irq_pending_work);

chip->position_fix = check_position_fix(chip, position_fix[dev]);
check_probe_mask(chip, dev);
@@ -2200,6 +2204,15 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci,
if (pci_enable_msi(pci) < 0)
chip->msi = 0;

+ snprintf(qname, sizeof(qname), "hda%d", chip->card->number);
+ chip->workq = create_workqueue(qname);
+ if (!chip->workq) {
+ snd_printk(KERN_ERR SFX "cannot create workqueue %s\n", qname);
+ err = -ENOMEM;
+ goto errout;
+ }
+ INIT_WORK(&chip->irq_pending_work, azx_irq_pending_work);
+
if (azx_acquire_irq(chip, 0) < 0) {
err = -EBUSY;
goto errout;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/