Re: [PATCH v2] ALSA: hda: Continue to probe when codec probe fails

From: Takashi Iwai
Date: Wed Dec 16 2020 - 11:57:54 EST


On Wed, 16 Dec 2020 17:22:17 +0100,
Takashi Iwai wrote:
>
> On Wed, 16 Dec 2020 17:07:45 +0100,
> Kai-Heng Feng wrote:
> >
> > On Wed, Dec 16, 2020 at 11:58 PM Takashi Iwai <tiwai@xxxxxxx> wrote:
> > >
> > > On Wed, 16 Dec 2020 16:50:20 +0100,
> > > Kai-Heng Feng wrote:
> > > >
> > > > On Wed, Dec 16, 2020 at 11:41 PM Takashi Iwai <tiwai@xxxxxxx> wrote:
> > > > >
> > > > > On Wed, 16 Dec 2020 13:47:24 +0100,
> > > > > Kai-Heng Feng wrote:
> > > > > >
> > > > > > Similar to commit 9479e75fca37 ("ALSA: hda: Keep the controller
> > > > > > initialization even if no codecs found"), when codec probe fails, it
> > > > > > doesn't enable runtime suspend, and can prevent graphics card from
> > > > > > getting powered down:
> > > > > > [ 4.280991] snd_hda_intel 0000:01:00.1: no codecs initialized
> > > > > >
> > > > > > $ cat /sys/bus/pci/devices/0000:01:00.1/power/runtime_status
> > > > > > active
> > > > > >
> > > > > > So mark there's no codec and continue probing to let runtime PM to work.
> > > > > >
> > > > > > BugLink: https://bugs.launchpad.net/bugs/1907212
> > > > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
> > > > >
> > > > > Hm, but if the probe fails, doesn't it mean something really wrong?
> > > > > IOW, how does this situation happen?
> > > >
> > > > The HDA controller is forcely created by quirk_nvidia_hda(). So
> > > > probably there's really not an HDA controller.
> > >
> > > I still don't understand how non-zero codec_mask is passed.
> > > The non-zero codec_mask means that BIOS or whatever believes that
> > > HD-audio codecs are present and let HD-audio controller reporting the
> > > presence. What error did you get at probing?
> >
> > [ 4.280991] snd_hda_intel 0000:01:00.1: no codecs initialized
> > Full dmesg here:
> > https://launchpadlibrarian.net/510351476/dmesg.log
>
> The actual problems are shown before that line.
>
> [ 4.178848] pci 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)
> [ 4.179502] snd_hda_intel 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)
> [ 4.179511] snd_hda_intel 0000:01:00.1: can't change power state from D3hot to D0 (config space inaccessible)
> ....
> [ 4.280571] hdaudio hdaudioC1D0: no AFG or MFG node found
> [ 4.280633] hdaudio hdaudioC1D1: no AFG or MFG node found
> [ 4.280685] hdaudio hdaudioC1D2: no AFG or MFG node found
> [ 4.280736] hdaudio hdaudioC1D3: no AFG or MFG node found
> [ 4.280788] hdaudio hdaudioC1D4: no AFG or MFG node found
> [ 4.280839] hdaudio hdaudioC1D5: no AFG or MFG node found
> [ 4.280892] hdaudio hdaudioC1D6: no AFG or MFG node found
> [ 4.280943] hdaudio hdaudioC1D7: no AFG or MFG node found
>
> Could you check the codec_mask value read in
> sound/hda/hdac_controller.c? I guess it reads 0xff.
>
> If that's the case, it can be corrected by the patch below.
> But, we should check the cause of the first error (inaccessible config
> space) in anyway; this must be the primary reason of the whole chain
> of errors.

Now I took a deeper look at the code. So we hit errors after errors:
- The first problem is that quirk_nvidia_hda() enabled HD-audio even
if it's non-functional by some reason. We may need additional
checks there.

- The second problem is that pci_enable_device() ignores the error
returned from pci_set_power_state() if it's -EIO. And the
inaccessible access error returns -EIO, although it's rather a fatal
problem. So the driver believes as the PCI device gets enabled
properly.

- The third problem is that HD-audio driver blindly believes the
codec_mask read from the register even if it's a read failure as I
already showed.

Ideally we should address in the first place.


Takashi