Re: switcheroo registration vs switching race...

From: Takashi Iwai
Date: Mon Dec 03 2012 - 12:10:32 EST


At Tue, 4 Dec 2012 00:50:56 +0800,
Daniel J Blueman wrote:
>
> On 4 December 2012 00:23, Takashi Iwai <tiwai@xxxxxxx> wrote:
> > At Mon, 3 Dec 2012 23:08:28 +0800,
> > Daniel J Blueman wrote:
> >>
> >> On 3 December 2012 22:40, Takashi Iwai <tiwai@xxxxxxx> wrote:
> >> > At Mon, 3 Dec 2012 22:25:52 +0800,
> >> > Daniel J Blueman wrote:
> >> >>
> >> >> On 3 December 2012 19:17, Takashi Iwai <tiwai@xxxxxxx> wrote:
> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
> >> >> > Takashi Iwai wrote:
> >> >> >>
> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
> >> >> >> Daniel J Blueman wrote:
> >> >> >> >
> >> >> >> > Hi Seth, Dave, Takashi,
> >> >> >> >
> >> >> >> > If I power down the unused discrete GPU before lightdm starts by
> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I see a race
> >> >> >> > manifesting as the discrete GPU's HDA controller timing out to
> >> >> >> > commands [2].
> >> >> >> >
> >> >> >> > Adding some debug, I see that the registered audio devices are put
> >> >> >> > into D3 before the GPU is, but it turns out that the discrete (and
> >> >> >> > internal) GPU's HDA controller gets registered a bit later, so the
> >> >> >> > list is empty. The symptom is since the HDA driver it's talking to
> >> >> >> > hardware which is now in D3.
> >> >> >> >
> >> >> >> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA
> >> >> >> > controller, but perhaps this should be solved at a higher level in the
> >> >> >> > vgaswitcheroo code; what do you think?
> >> >> >>
> >> >> >> Maybe it's a side effect for the recent effort to fix another race in
> >> >> >> the probe. A part of them problem is that the registration is done at
> >> >> >> the very last of probing.
> >> >> >>
> >> >> >> Instead of delaying the registration, how about the patch below?
> >> >> >
> >> >> > Ping. If this really works, I'd like to queue it for 3.8 merge, at
> >> >> > least...
> >> >>
> >> >> Ping ack; I was trying to find time to understand another race that
> >> >> occurs with GPU probing after switching, but is separate from the
> >> >> situation before switching, here.
> >> >>
> >> >> In the context of writing the switch, it looks like struct azx isn't
> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with
> >> >> azx_codec_create?
> >> >
> >> > It was allocated, but it wasn't assigned properly in pci drvdata.
> >> >
> >> > Below is the revised patch. Just moved pci_set_drvdata() before
> >> > register_vga_switcheroo(). Could you retest with it?
> >>
> >> Superb; this addresses the oops.
> >
> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
> >
> >> ~1 second after the DGPU is put into D3, I still often see "hda-intel:
> >> spurious response 0x0:0x0, last cmd=0x170500":
> >> http://quora.org/2012/hda-switch-spurious.txt
> >
> > Hm, it's not clear who triggers these messages. I'll try to check the
> > code paths.
> >
> >> Presumably this implies the read of the ring-buffer pointer returned
> >> 0xffffffff, so the HDA driver understands the pointer to have wrapped
> >> and processes the 191 unwritten entries?
> >
> > Good point. Actually there is one bug that looks obviously wrong
> > (writing 32bit value to CORBWP). Maybe it has been working just
> > because writing CORBRP doesn't influence except for the reset bit.
> >
> > Reading CORBWP as a byte is OK, but this could be better in a word so
> > that we can check 0xffff as invalid.
> >
> > A test patch is below. Hopefully this improves the situation...
>
> I'll check this out tomorrow and also instrument the code to get a
> backtrace, since there may still be an underlying race with the
> previous patches:
>
> [ 8.203827] snd_hda_intel 0000:00:1b.0: enabling device (0000 -> 0002)
> [ 8.203936] snd_hda_intel 0000:00:1b.0: irq 51 for MSI/MSI-X
> [ 10.981297] VGA switcheroo: switched nouveau off
> [ 10.981383] nouveau [ DRM] suspending fbcon...
> [ 10.981388] nouveau [ DRM] suspending display...
> [ 10.981687] nouveau [ DRM] unpinning framebuffer(s)...
> [ 10.981825] nouveau [ DRM] evicting buffers...
> [ 10.992948] nouveau [ DRM] suspending client object trees...
> [ 11.310697] hda-intel: azx_get_response timeout, switching to
> polling mode: last cmd=0x000f0000
> [ 12.320236] hda-intel: No response from codec, disabling MSI: last
> cmd=0x000f0000
> [ 13.329721] hda-intel: Codec #0 probe error; disabling it...
> [ 14.419163] hda_intel: azx_get_response timeout, switching to
> single_cmd mode: last cmd=0x000f0000
> [ 14.419855] hda-intel: no codecs initialized
> [ 14.459150] snd_hda_intel 0000:01:00.1: Refused to change power
> state, currently in D3
> [ 14.459176] hda-intel: 0000:01:00.1: Handle VGA-switcheroo audio client
> [ 14.459183] hda-intel: VGA controller for 0000:01:00.1 is disabled
> [ 14.459184] hda-intel: Delaying initialization
>
> Full log at: http://quora.org/2012/hda-switch-spurious2.txt
>
> The first response timeout message plausibly could be from the CS4206
> (0:1b.0) or Nvidia DGPU (1:0.1); either way, there are no audio
> devices after.

That's odd. The Cirrus one (0000:00:1b.0) must be independent from
the vga switcheroo things for Nvidia...

The patch below is the revised patch of the first one. Now the vga
switcheroo registration is done before the video controller D3 check,
so the race should be smaller.


> Also, we see that checking for master abort in one
> place won't be sufficient.
>
> Stay tuned for backtraces tomorrow.

OK, thanks.


Takashi

---
From: Takashi Iwai <tiwai@xxxxxxx>
Subject: [PATCH] ALSA: hda - Fix yet another race of VGA switcheroo

When the video is turned off while probing the HD-audio driver, the
audio driver gets stuck because the registration of vga_switcheroo
client is done at the end of probing phase without notified.

For fixing this race, here we introduced two things:
- The vga switcheroo is registered at the early stage before accessing
any hardware register,
- Add a completion and make the can_switch callback waiting until all
probing process has been finished.

Reported-by: Daniel J Blueman <daniel@xxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Takashi Iwai <tiwai@xxxxxxx>
---
sound/pci/hda/hda_intel.c | 60 ++++++++++++++++++++++++-----------------------
1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 4bb52da..b845be5 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -49,6 +49,7 @@
#include <linux/pm_runtime.h>
#include <linux/clocksource.h>
#include <linux/time.h>
+#include <linux/completion.h>

#ifdef CONFIG_X86
/* for snoop control */
@@ -469,6 +470,7 @@ struct azx {
/* locks */
spinlock_t reg_lock;
struct mutex open_mutex;
+ struct completion probe_wait;

/* streams (x num_streams) */
struct azx_dev *azx_dev;
@@ -2790,6 +2792,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci)
struct snd_card *card = pci_get_drvdata(pci);
struct azx *chip = card->private_data;

+ wait_for_completion(&chip->probe_wait);
if (chip->init_failed)
return false;
if (chip->disabled || !chip->bus)
@@ -2851,6 +2854,9 @@ static int azx_free(struct azx *chip)

azx_notifier_unregister(chip);

+ chip->init_failed = 1; /* to be sure */
+ complete(&chip->probe_wait);
+
if (use_vga_switcheroo(chip)) {
if (chip->disabled && chip->bus)
snd_hda_unlock_devices(chip->bus);
@@ -3156,6 +3162,7 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci,
INIT_LIST_HEAD(&chip->pcm_list);
INIT_LIST_HEAD(&chip->list);
init_vga_switcheroo(chip);
+ init_completion(&chip->probe_wait);

chip->position_fix[0] = chip->position_fix[1] =
check_position_fix(chip, position_fix[dev]);
@@ -3183,26 +3190,6 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci,
}
}

- if (check_hdmi_disabled(pci)) {
- snd_printk(KERN_INFO SFX "VGA controller for %s is disabled\n",
- pci_name(pci));
- if (use_vga_switcheroo(chip)) {
- snd_printk(KERN_INFO SFX "Delaying initialization\n");
- chip->disabled = true;
- goto ok;
- }
- kfree(chip);
- pci_disable_device(pci);
- return -ENXIO;
- }
-
- err = azx_first_init(chip);
- if (err < 0) {
- azx_free(chip);
- return err;
- }
-
- ok:
err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops);
if (err < 0) {
snd_printk(KERN_ERR SFX "Error creating device [card]!\n");
@@ -3447,7 +3434,29 @@ static int __devinit azx_probe(struct pci_dev *pci,
if (err < 0)
goto out_free;
card->private_data = chip;
+
+ pci_set_drvdata(pci, card);
+
+ err = register_vga_switcheroo(chip);
+ if (err < 0) {
+ snd_printk(KERN_ERR SFX
+ "Error registering VGA-switcheroo client\n");
+ goto out_free;
+ }
+
+ if (check_hdmi_disabled(pci)) {
+ snd_printk(KERN_INFO SFX "VGA controller for %s is disabled\n",
+ pci_name(pci));
+ snd_printk(KERN_INFO SFX "Delaying initialization\n");
+ chip->disabled = true;
+ }
+
probe_now = !chip->disabled;
+ if (probe_now) {
+ err = azx_first_init(chip);
+ if (err < 0)
+ goto out_free;
+ }

#ifdef CONFIG_SND_HDA_PATCH_LOADER
if (patch[dev] && *patch[dev]) {
@@ -3468,23 +3477,16 @@ static int __devinit azx_probe(struct pci_dev *pci,
goto out_free;
}

- pci_set_drvdata(pci, card);
-
if (pci_dev_run_wake(pci))
pm_runtime_put_noidle(&pci->dev);

- err = register_vga_switcheroo(chip);
- if (err < 0) {
- snd_printk(KERN_ERR SFX
- "Error registering VGA-switcheroo client\n");
- goto out_free;
- }
-
dev++;
+ complete(&chip->probe_wait);
return 0;

out_free:
snd_card_free(card);
+ pci_set_drvdata(pci, NULL);
return err;
}

--
1.8.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/