Re: switcheroo registration vs switching race...
From: Takashi Iwai
Date: Mon Dec 03 2012 - 11:23:18 EST
At Mon, 3 Dec 2012 23:08:28 +0800,
Daniel J Blueman wrote:
>
> On 3 December 2012 22:40, Takashi Iwai <tiwai@xxxxxxx> wrote:
> > At Mon, 3 Dec 2012 22:25:52 +0800,
> > Daniel J Blueman wrote:
> >>
> >> On 3 December 2012 19:17, Takashi Iwai <tiwai@xxxxxxx> wrote:
> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
> >> > Takashi Iwai wrote:
> >> >>
> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
> >> >> Daniel J Blueman wrote:
> >> >> >
> >> >> > Hi Seth, Dave, Takashi,
> >> >> >
> >> >> > If I power down the unused discrete GPU before lightdm starts by
> >> >> > fiddling with the sysfs file [1] in the upstart script, I see a race
> >> >> > manifesting as the discrete GPU's HDA controller timing out to
> >> >> > commands [2].
> >> >> >
> >> >> > Adding some debug, I see that the registered audio devices are put
> >> >> > into D3 before the GPU is, but it turns out that the discrete (and
> >> >> > internal) GPU's HDA controller gets registered a bit later, so the
> >> >> > list is empty. The symptom is since the HDA driver it's talking to
> >> >> > hardware which is now in D3.
> >> >> >
> >> >> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA
> >> >> > controller, but perhaps this should be solved at a higher level in the
> >> >> > vgaswitcheroo code; what do you think?
> >> >>
> >> >> Maybe it's a side effect for the recent effort to fix another race in
> >> >> the probe. A part of them problem is that the registration is done at
> >> >> the very last of probing.
> >> >>
> >> >> Instead of delaying the registration, how about the patch below?
> >> >
> >> > Ping. If this really works, I'd like to queue it for 3.8 merge, at
> >> > least...
> >>
> >> Ping ack; I was trying to find time to understand another race that
> >> occurs with GPU probing after switching, but is separate from the
> >> situation before switching, here.
> >>
> >> In the context of writing the switch, it looks like struct azx isn't
> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with
> >> azx_codec_create?
> >
> > It was allocated, but it wasn't assigned properly in pci drvdata.
> >
> > Below is the revised patch. Just moved pci_set_drvdata() before
> > register_vga_switcheroo(). Could you retest with it?
>
> Superb; this addresses the oops.
OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
> ~1 second after the DGPU is put into D3, I still often see "hda-intel:
> spurious response 0x0:0x0, last cmd=0x170500":
> http://quora.org/2012/hda-switch-spurious.txt
Hm, it's not clear who triggers these messages. I'll try to check the
code paths.
> Presumably this implies the read of the ring-buffer pointer returned
> 0xffffffff, so the HDA driver understands the pointer to have wrapped
> and processes the 191 unwritten entries?
Good point. Actually there is one bug that looks obviously wrong
(writing 32bit value to CORBWP). Maybe it has been working just
because writing CORBRP doesn't influence except for the reset bit.
Reading CORBWP as a byte is OK, but this could be better in a word so
that we can check 0xffff as invalid.
A test patch is below. Hopefully this improves the situation...
thanks,
Takashi
---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 4bb52da..ce990db 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -805,13 +805,18 @@ static int azx_corb_send_cmd(struct hda_bus *bus, u32 val)
spin_lock_irq(&chip->reg_lock);
/* add command to corb */
- wp = azx_readb(chip, CORBWP);
+ wp = azx_readw(chip, CORBWP);
+ if (wp == 0xffff) {
+ /* something wrong, controller likely turned to D3 */
+ spin_unlock_irq(&chip->reg_lock);
+ return -1;
+ }
wp++;
wp %= ICH6_MAX_CORB_ENTRIES;
chip->rirb.cmds[addr]++;
chip->corb.buf[wp] = cpu_to_le32(val);
- azx_writel(chip, CORBWP, wp);
+ azx_writew(chip, CORBWP, wp);
spin_unlock_irq(&chip->reg_lock);
@@ -827,7 +832,12 @@ static void azx_update_rirb(struct azx *chip)
unsigned int addr;
u32 res, res_ex;
- wp = azx_readb(chip, RIRBWP);
+ wp = azx_readw(chip, RIRBWP);
+ if (wp == 0xffff) {
+ /* something wrong, controller likely turned to D3 */
+ return;
+ }
+
if (wp == chip->rirb.wp)
return;
chip->rirb.wp = wp;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/