Re: ioatdma(Intel(R) I/OAT DMA Engine init failed)
From: Gavin Guo
Date: Wed May 18 2016 - 09:27:10 EST
On Tue, May 17, 2016 at 6:06 PM, Vinod Koul <vinod.koul@xxxxxxxxx> wrote:
> On Mon, May 16, 2016 at 06:08:20PM +0800, Gavin Guo wrote:
>> The following error messages can be observed on the Intel Haswell-E
>> chipset with v3.13 kernel. After the analysis, I found there is no
>> difference in the logic of these error messages in the current
>> upstream kernel. I also searched the git log and can't find any commit
>> which is fix to the error(correct me if I am wrong). The following is
>> the detail, and I'll really appreciate if there is any comment. :)
>
> 3.13 is ancient, can you check this on latest kernel
Thank you for the comment. It's running on the production system. However,
I'll try to figure out if it's possible to test the latest kernel.
>
>>
>> ioatdma 0000:00:04.0: channel error register unreachable
>> ioatdma 0000:00:04.0: channel enumeration error
>> ioatdma 0000:00:04.0: Intel(R) I/OAT DMA Engine init failed
>> ioatdma 0000:00:04.1: channel error register unreachable
>> ioatdma 0000:00:04.1: channel enumeration error
>> ioatdma 0000:00:04.1: Intel(R) I/OAT DMA Engine init failed
>> ...
>> ioatdma 0000:00:04.7: channel error register unreachable
>> ioatdma 0000:00:04.7: channel enumeration error
>> ioatdma 0000:00:04.7: Intel(R) I/OAT DMA Engine init failed
>> mei_me 0000:00:16.0: initialization failed.
>>
>> There are 8 I/OAT DMA controllers on the Haswell-E chipset:
>> 8086:2f20 ~ 8086:2f27
>> 80:04.0 System peripheral: Intel Corporation Haswell-E DMA Channel 0 (rev 02)
>> 80:04.1 System peripheral: Intel Corporation Haswell-E DMA Channel 1 (rev 02)
>> 80:04.2 System peripheral: Intel Corporation Haswell-E DMA Channel 2 (rev 02)
>> 80:04.3 System peripheral: Intel Corporation Haswell-E DMA Channel 3 (rev 02)
>> 80:04.4 System peripheral: Intel Corporation Haswell-E DMA Channel 4 (rev 02)
>> 80:04.5 System peripheral: Intel Corporation Haswell-E DMA Channel 5 (rev 02)
>> 80:04.6 System peripheral: Intel Corporation Haswell-E DMA Channel 6 (rev 02)
>> 80:04.7 System peripheral: Intel Corporation Haswell-E DMA Channel 7 (rev 02)
>>
>> Analysis:
>> The bug happens when the driver is resetting DMA controller, this is
>> the sequence: The function, ioat_pci_probe, is called when the DMA
>> controller is detected by the PCI bus. Then,
>> ioat3_dma_probe -> ioat_probe -> ioat2_enumerate_channels ->
>> ioat3_reset_hw. The following code can be found in the ioat3_reset_hw:
>>
>> drivers/dma/ioat/dma_v3.c:
>> chanerr = readl(chan->reg_base + IOAT_CHANERR_OFFSET);
>> writel(chanerr, chan->reg_base + IOAT_CHANERR_OFFSET);
>> ...
>> err = pci_read_config_dword(pdev,
>> IOAT_PCI_CHANERR_INT_OFFSET, &chanerr);
>> if (err) {
>> dev_err(&pdev->dev,
>> "channel error register unreachable\n");
>> return err;
>> }
>>
>> Obviously, there are something wrong in the channel error register
>> reset process. Then all the way back to ioat_probe(). Because the
>> error happens, the dma->chancnt will be set to 0:
>>
>> drivers/dma/ioat/dma.c:
>> if (!dma->chancnt) {
>> dev_err(dev, "channel enumeration error\n");
>> goto err_setup_interrupts;
>> }
>>
>> Finally back to ioat_pci_probe:
>>
>> drivers/dma/ioat/pci.c:
>> err = ioat3_dma_probe(device, ioat_dca_enabled);
>> else
>> return -ENODEV;
>>
>> if (err) {
>> dev_err(dev, "Intel(R) I/OAT DMA Engine init
>> failed\n");
>> return -ENODEV;
>
> --
> ~Vinod