Re: [PATCH v2] ALSA: hda: Continue to probe when codec probe fails
From: Kai-Heng Feng
Date: Fri Dec 18 2020 - 00:14:28 EST
[+Cc Bjorn, Alan and linux-pci]
On Thu, Dec 17, 2020 at 12:57 AM Takashi Iwai <tiwai@xxxxxxx> wrote:
>
> On Wed, 16 Dec 2020 17:22:17 +0100,
> Takashi Iwai wrote:
> >
> > On Wed, 16 Dec 2020 17:07:45 +0100,
> > Kai-Heng Feng wrote:
> > >
> > > On Wed, Dec 16, 2020 at 11:58 PM Takashi Iwai <tiwai@xxxxxxx> wrote:
> > > >
> > > > On Wed, 16 Dec 2020 16:50:20 +0100,
> > > > Kai-Heng Feng wrote:
> > > > >
> > > > > On Wed, Dec 16, 2020 at 11:41 PM Takashi Iwai <tiwai@xxxxxxx> wrote:
> > > > > >
> > > > > > On Wed, 16 Dec 2020 13:47:24 +0100,
> > > > > > Kai-Heng Feng wrote:
> > > > > > >
> > > > > > > Similar to commit 9479e75fca37 ("ALSA: hda: Keep the controller
> > > > > > > initialization even if no codecs found"), when codec probe fails, it
> > > > > > > doesn't enable runtime suspend, and can prevent graphics card from
> > > > > > > getting powered down:
> > > > > > > [ 4.280991] snd_hda_intel 0000:01:00.1: no codecs initialized
> > > > > > >
> > > > > > > $ cat /sys/bus/pci/devices/0000:01:00.1/power/runtime_status
> > > > > > > active
> > > > > > >
> > > > > > > So mark there's no codec and continue probing to let runtime PM to work.
> > > > > > >
> > > > > > > BugLink: https://bugs.launchpad.net/bugs/1907212
> > > > > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
> > > > > >
> > > > > > Hm, but if the probe fails, doesn't it mean something really wrong?
> > > > > > IOW, how does this situation happen?
> > > > >
> > > > > The HDA controller is forcely created by quirk_nvidia_hda(). So
> > > > > probably there's really not an HDA controller.
> > > >
> > > > I still don't understand how non-zero codec_mask is passed.
> > > > The non-zero codec_mask means that BIOS or whatever believes that
> > > > HD-audio codecs are present and let HD-audio controller reporting the
> > > > presence. What error did you get at probing?
> > >
> > > [ 4.280991] snd_hda_intel 0000:01:00.1: no codecs initialized
> > > Full dmesg here:
> > > https://launchpadlibrarian.net/510351476/dmesg.log
> >
> > The actual problems are shown before that line.
> >
> > [ 4.178848] pci 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)
> > [ 4.179502] snd_hda_intel 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)
> > [ 4.179511] snd_hda_intel 0000:01:00.1: can't change power state from D3hot to D0 (config space inaccessible)
> > ....
> > [ 4.280571] hdaudio hdaudioC1D0: no AFG or MFG node found
> > [ 4.280633] hdaudio hdaudioC1D1: no AFG or MFG node found
> > [ 4.280685] hdaudio hdaudioC1D2: no AFG or MFG node found
> > [ 4.280736] hdaudio hdaudioC1D3: no AFG or MFG node found
> > [ 4.280788] hdaudio hdaudioC1D4: no AFG or MFG node found
> > [ 4.280839] hdaudio hdaudioC1D5: no AFG or MFG node found
> > [ 4.280892] hdaudio hdaudioC1D6: no AFG or MFG node found
> > [ 4.280943] hdaudio hdaudioC1D7: no AFG or MFG node found
> >
> > Could you check the codec_mask value read in
> > sound/hda/hdac_controller.c? I guess it reads 0xff.
> >
> > If that's the case, it can be corrected by the patch below.
> > But, we should check the cause of the first error (inaccessible config
> > space) in anyway; this must be the primary reason of the whole chain
> > of errors.
>
> Now I took a deeper look at the code. So we hit errors after errors:
> - The first problem is that quirk_nvidia_hda() enabled HD-audio even
> if it's non-functional by some reason. We may need additional
> checks there.
Quite possibly the system doesn't power up HDA controller when there's
no external monitor.
So when it's connected to external monitor, it's still needed for HDMI audio.
Let me ask the user to confirm this.
>
> - The second problem is that pci_enable_device() ignores the error
> returned from pci_set_power_state() if it's -EIO. And the
> inaccessible access error returns -EIO, although it's rather a fatal
> problem. So the driver believes as the PCI device gets enabled
> properly.
This was introduced in 2005, by Alan's 11f3859b1e85 ("[PATCH] PCI: Fix
regression in pci_enable_device_bars") to fix UHCI controller.
>
> - The third problem is that HD-audio driver blindly believes the
> codec_mask read from the register even if it's a read failure as I
> already showed.
This approach has least regression risk.
Kai-Heng
> Ideally we should address in the first place.
>
>
> Takashi