Re: [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel

From: Baoquan He
Date: Sat Jan 11 2020 - 05:04:11 EST


On 01/10/20 at 08:45pm, Khalid Aziz wrote:
> >>>>>> - if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
> >>>>>> + if ((kexec_in_progress || is_kdump_kernel()) &&
> >>>>>> + pci_dev->current_state <= PCI_D3hot)
> >>>>>> pci_clear_master(pci_dev);
> >>>>>
> >>>>> I'm clearly missing something because this will turn off bus mastering
> >>>>> in cases where we previously left it enabled.
> >>>>>
> >>>>> I was assuming the crash was related to a device doing DMA when the
> >>>>> Root Port had bus mastering disabled. But that must be wrong.
> >>>>>
> >>>>> I'd like to understand the crash/hang better because the quirk
> >>>>> especially is hard to connect to anything. If the crash is because of
> >>>>> an AER or other PCIe error, maybe another possibility is that we could
> >>>>> handle it better or disable signaling of it or something.
> >>>>>
> >>>>
> >>>> I am not understanding this failure mode either. That code in
> >>>> pci_device_shutdown() was added originally to address this very issue.
> >>>> The patch 4fc9bbf98fd6 ("PCI: Disable Bus Master only on kexec reboot")
> >>>> shut down any errant DMAs from PCI devices as we kexec a new kernel. In
> >>>> this new patch, this is the same code path that will be taken again when
> >>>> kdump kernel is shutting down. If the errant DMA problem was not fixed
> >>>> by clearing Bus Master bit in this path when kdump kernel was being
> >>>> kexec'd, why does the same code path work the second time around when
> >>>> kdump kernel is shutting down? Is there more going on that we don't
> >>>> understand?
> >>>>
> >>>
> >>> Khalid,
> >>>
> >>> I don't believe we execute that code path in the crash case.
> >>>
> >>> The variable kexec_in_progress is set true in kernel_kexec() before calling
> >>> machine_kexec(). This is the fast reboot case.
> >>>
> >>> I don't see kexec_in_progress set true elsewhere.
> >>>
> >>>
> >>> The code path for crash is different.
> >>>
> >>> For instance, panic() will call
> >>> -> __crash_kexec() which calls
> >>> -> machine_kexec().
> >>>
> >>> So the setting of kexec_in_progress is bypassed.
> >>>
> >>
> >> True, but what that means is if it is an errant DMA causing the issue
> >> you are seeing, that errant DMA can happen any time between when we
> >
> > Here, there could be misunderstanding. It's not an errant DMA, it's an
> > device which may be in DMA transporting state in normal kernel, but in
> > kdump kernel it's not manipulated by its driver because we don't use it
> > to dump, so exlucde its driver from kdump initramfs for saving space.
> >
>
> Errant DMA as in currently running kernel did not enable the device to
> do DMA and is not ready for it. If a device can issue DMA request in
> this state, it could do it well before kdump kernel starts shutting
> down. Don't we need to fix this before we start shutting down kdump in
> preparation for reboot? I can see the need for this fix, but I am not
> sure if this patch places the fix in right place.

Ah, I could get your point now, but not very sure. Do you mean the HPSA
is in errant DMA state, because it doesn't have driver to re-initilize
to correct state?

We have been doing like this for kdump kernel always, to only
include needed devices' driver to kdump initrd so that the driver will
initialize and opearte the device. For other unneeded devices by kdump
kernel, we just leave it as is. This error is only seen on this HPE
owned system. Wondering if HPE can do something to check their
firmware/hardware setting.

This patch is using a quirk to adjust all those devices if its
pci_dev->current_state is beyond PCI_D3hot during bootup. Then clear the
master bit of device firstly, next clear its parent bridge's master bit,
to make reboot proceed further. This need you PCI experts to offer help
to evaluate.

Thanks
Baoquan