On Tuesday, January 08, 2013 09:27:55 AM Yinghai Lu wrote:On Tue, Jan 8, 2013 at 8:50 AM, Thomas Renninger <trenn@xxxxxxx> wrote:megaraid_sas
can you check if your initrd for kdump kernel has that driver and
module that it depends on like
scsi sas transport etc ?
Removing the 5 patches and the disk works and the
dump is written.
I can look a bit further at the memmap=exactmap issue tomorrow.
I can also double check above then, but I am rather sure about it
I tried plain vanilla -> worked, dumping started
I tried with only these 5 patches added -> no disk.
You try to initialize the PCI subsystem in a way the BIOS typically has
to do it in kexec case?
Reacting and trying to handle error condtitions more gracefully
at the place where they are caught could be another approach which
imo makes sense to implement in parallel.
In my case for example I see:
"Present field in the IRTE entry is clear"
DMAR errors. I expect this comes from a device which still throws
interrupts, but irq vector got not set-up or registered in the kexec'ed
I could imagine this is the same error which happens when an irq is
wrongly configured and spurious interrupts happen (but in irq remapped case).
In my case it's not sever as I only see this message once, but according
to another report, they see about 80 of such DMAR error messages per
second. This seem to result in endless DMAR error interrupts and finally
a dead system.
I wonder whether the DMAR error handler could already invoke a PCIe
int pci_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state state)
which unfortunatly is only implemented for PPC, but would it make sense to
implement this one and trigger function level reset if several specific DMAR
errors are seen (or other PCI(e) error handlers get active?)?
If this does not help the next step could be to stop DMAR error interrupt
handling or other iommu commands to keep the machine alive, even if one
device keeps firing interrupts to an unconfigured irq vector (or whatever other
things could happen).
Just some ideas...