Re: fsl_espi errors on v5.7.15

From: Chris Packham
Date: Sun Aug 30 2020 - 18:01:37 EST



On 31/08/20 9:41 am, Heiner Kallweit wrote:
> On 30.08.2020 23:00, Chris Packham wrote:
>> On 31/08/20 12:30 am, Nicholas Piggin wrote:
>>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
>> <snip>
>>
>>>>>>>> I've also now seen the RX FIFO not empty error on the T2080RDB
>>>>>>>>
>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>>>
>>>>>>>> With my current workaround of emptying the RX FIFO. It seems
>>>>>>>> survivable. Interestingly it only ever seems to be 1 extra byte in the
>>>>>>>> RX FIFO and it seems to be after either a READ_SR or a READ_FSR.
>>>>>>>>
>>>>>>>> fsl_espi ffe110000.spi: tx 70
>>>>>>>> fsl_espi ffe110000.spi: rx 03
>>>>>>>> fsl_espi ffe110000.spi: Extra RX 00
>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>>> fsl_espi ffe110000.spi: tx 05
>>>>>>>> fsl_espi ffe110000.spi: rx 00
>>>>>>>> fsl_espi ffe110000.spi: Extra RX 03
>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>>> fsl_espi ffe110000.spi: tx 05
>>>>>>>> fsl_espi ffe110000.spi: rx 00
>>>>>>>> fsl_espi ffe110000.spi: Extra RX 03
>>>>>>>>
>>>>>>>> From all the Micron SPI-NOR datasheets I've got access to it is
>>>>>>>> possible to continually read the SR/FSR. But I've no idea why it
>>>>>>>> happens some times and not others.
>>>>>>> So I think I've got a reproduction and I think I've bisected the problem
>>>>>>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in
>>>>>>> C"). My day is just finishing now so I haven't applied too much scrutiny
>>>>>>> to this result. Given the various rabbit holes I've been down on this
>>>>>>> issue already I'd take this information with a good degree of skepticism.
>>>>>>>
>>>>>> OK, so an easy test should be to re-test with a 5.4 kernel.
>>>>>> It doesn't have yet the change you're referring to, and the fsl-espi driver
>>>>>> is basically the same as in 5.7 (just two small changes in 5.7).
>>>>> There's 6cc0c16d82f88 and maybe also other interrupt related patches
>>>>> around this time that could affect book E, so it's good if that exact
>>>>> patch is confirmed.
>>>> My confirmation is basically that I can induce the issue in a 5.4 kernel
>>>> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
>>>> 5.9-rc2 by reverting that one commit.
>>>>
>>>> I both cases it's not exactly a clean cherry-pick/revert so I also
>>>> confirmed the bisection result by building at 3282a3da25bd (which sees
>>>> the issue) and the commit just before (which does not).
>>> Thanks for testing, that confirms it well.
>>>
>>> [snip patch]
>>>
>>>> I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
>>>> didn't report anything (either with or without the change above).
>>> Okay, it was a bit of a shot in the dark. I still can't see what
>>> else has changed.
>>>
>>> What would cause this, a lost interrupt? A spurious interrupt? Or
>>> higher interrupt latency?
>>>
>>> I don't think the patch should cause significantly worse latency,
>>> (it's supposed to be a bit better if anything because it doesn't set
>>> up the full interrupt frame). But it's possible.
>> My working theory is that the SPI_DON indication is all about the TX
>> direction an now that the interrupts are faster we're hitting an error
>> because there is still RX activity going on. Heiner disagrees with my
>> interpretation of the SPI_DON indication and the fact that it doesn't
>> happen every time does throw doubt on it.
>>
> It's right that the eSPI spec can be interpreted that SPI_DON refers to
> TX only. However this wouldn't really make sense, because also for RX
> we program the frame length, and therefore want to be notified once the
> full frame was received. Also practical experience shows that SPI_DON
> is set also after RX-only transfers.
> Typical SPI NOR use case is that you write read command + start address,
> followed by a longer read. If the TX-only interpretation would be right,
> we'd always end up with SPI_DON not being set.
>
>> I can't really explain the extra RX byte in the fifo. We know how many
>> bytes to expect and we pull that many from the fifo so it's not as if
>> we're missing an interrupt causing us to skip the last byte. I've been
>> looking for some kind of off-by-one calculation but again if it were
>> something like that it'd happen all the time.
>>
> Maybe it helps to know what value this extra byte in the FIFO has. Is it:
> - a duplicate of the last read byte
> - or the next byte (at <end address> + 1)
> - or a fixed value, e.g. always 0x00 or 0xff

The values were up thread a bit but I'll repeat them here

fsl_espi ffe110000.spi: tx 70
fsl_espi ffe110000.spi: rx 03
fsl_espi ffe110000.spi: Extra RX 00
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
fsl_espi ffe110000.spi: tx 05
fsl_espi ffe110000.spi: rx 00
fsl_espi ffe110000.spi: Extra RX 03
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
fsl_espi ffe110000.spi: tx 05
fsl_espi ffe110000.spi: rx 00
fsl_espi ffe110000.spi: Extra RX 03


The rx 00 Extra RX 03 is a bit concerning. I've only ever seen them with
either a READ_SR or a READ_FSR. Never a data read.