Re: MSI broken in libata?

From: Robert Hancock
Date: Sun Jan 17 2010 - 14:22:56 EST


On 01/16/2010 03:58 PM, Torsten Kaiser wrote:
On Mon, Jan 11, 2010 at 2:39 AM, Robert Hancock<hancockrwd@xxxxxxxxx> wrote:
On 01/10/2010 07:15 PM, Tejun Heo wrote:

On 01/10/2010 01:33 PM, Torsten Kaiser wrote:

I did try the patch from Robert Hancock in
http://lkml.org/lkml/2010/1/6/417 ,but without success.

if you need any more information, or have something for me to try,
please just ask. I did look at the code and the documentation about
enabling MSI, but did not see anything (obvious) wrong, so I don't
know what to try next.

Can you please try the attached patch?

Thanks.


It'd be interesting to see if it makes a difference, but I don't think the
patch is quite right.

As written in the other mail: No, Tejuns patch also didn't work.

According to the datasheet, doing the MSI ack while
the interrupt source is still pending will cause a new MSI to be sent, so if
you do it before handling the interrupt you'll generate a spurious interrupt
after every real one.

Though, apparently my patch that did the MSI ack after the handling didn't
help, so either that's wrong or the problem is unrelated. (I tend to suspect
the latter, given that sata_nv is also failing in the same way.)

Reading http://www.siliconimage.com/docs/SiI-DS-0138-D.pdf a possible
cause might have been, that this MSI ACK was never needed. Page 63 of
this PDF says about 'Global Control': "If all interrupt conditions are
removed subsequent to an MSI, it is not necessary to assert this
Acknowledge; another MSI will be generated when an interrupt condition
occurs."

But I did not find anything that might explain my problem.

Looking at my lspci output I noted the following:
For the PCIe-bridges:
Capabilities: [80] Express (v1) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s<64ns, L1<1us
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
For the tg3 onboard network chips:
Capabilities: [d0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s<4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
For the SiI chip:
Capabilities: [70] Express (v1) Legacy Endpoint, MSI 00
DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s<64ns, L1<1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes

So the maximum payload for it is bigger then that of the nVidia bridge.
As I don't have knowlegde of the PCI specs, I guess DevCap is what a
device is physically capable and DevCtl is the value that the BIOS /
kernel hat programmed into it for actual use.
If my guess is correct, then the SiI should be correctly limited to
128 bytes payload and that it should work.

BUT: Page 47 of the SiI-PDF says for 'Device Status and Control' the following:
Bit [14:12]: Max Read Request Size (R/W) – Allowable values are 000B
to 011B (128 to 1024 bytes).
Default is 010B (512 bytes).

So a MaxReadReq value of 4096 as indicated by lspci for my system
would be out of bounds.

Is is important? (Somehow it seems not: In the Not-MSI-case it is also
4096 bytes, but the system works fine...)


Can I do anything else to help debug this?

I don't think the MaxReadReq difference would be an issue - it's the max request size that device is allowed to generate, not the max it can accept. In any case, not sure how it would affect MSI since those requests would be a write, not a read, and would be tiny. You could always try changing it (I think setpci should be able to do it, though you might need to dig through specs to find out which bits those are).

Unfortunately I don't have any great debug suggestions other than those.. My first suspect would still be some kind of HT-MSI mapping issue, but it's strange that only writes seem to be having problems..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/