Re: fsl_espi errors on v5.7.15
From: Chris Packham
Date: Mon Aug 24 2020 - 23:54:56 EST
On 25/08/20 10:04 am, Chris Packham wrote:
>
> On 20/08/20 9:08 am, Chris Packham wrote:
>>
>> On 19/08/20 6:15 pm, Heiner Kallweit wrote:
>>> On 19.08.2020 00:44, Chris Packham wrote:
>>>> Hi Again,
>>>>
>>>> On 17/08/20 9:09 am, Chris Packham wrote:
>>>>
>>>>> On 14/08/20 6:19 pm, Heiner Kallweit wrote:
>>>>>> On 14.08.2020 04:48, Chris Packham wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm seeing a problem with accessing spi-nor after upgrading a T2081
>>>>>>> based system to linux v5.7.15
>>>>>>>
>>>>>>> For this board u-boot and the u-boot environment live on spi-nor.
>>>>>>>
>>>>>>> When I use fw_setenv from userspace I get the following kernel logs
>>>>>>>
>>>>>>> # fw_setenv foo=1
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't
>>>>>>> empty!
>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't
>>>>>>> empty!
>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't
>>>>>>> empty!
>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>> ...
>>>>>>>
>>>>>> This error reporting doesn't exist yet in 4.4. So you may have an
>>>>>> issue
>>>>>> under 4.4 too, it's just not reported.
>>>>>> Did you verify that under 4.4 fw_setenv actually has an effect?
>>>>> Just double checked and yes under 4.4 the setting does get saved.
>>>>>>> If I run fw_printenv (before getting it into a bad state) it is
>>>>>>> able to
>>>>>>> display the content of the boards u-boot environment.
>>>>>>>
>>>>>> This might indicate an issue with spi being locked. I've seen
>>>>>> related
>>>>>> questions, just use the search engine of your choice and check for
>>>>>> fw_setenv and locked.
>>>>> I'm running a version of fw_setenv which includes
>>>>> https://gitlab.denx.de/u-boot/u-boot/-/commit/db820159 so it
>>>>> shouldn't
>>>>> be locking things unnecessarily.
>>>>>>> If been unsuccessful in producing a setup for bisecting the
>>>>>>> issue. I do
>>>>>>> know the issue doesn't occur on the old 4.4.x based kernel but
>>>>>>> that's
>>>>>>> probably not much help.
>>>>>>>
>>>>>>> Any pointers on what the issue (and/or solution) might be.
>>>> I finally managed to get our board running with a vanilla kernel. With
>>>> corenet64_smp_defconfig I occasionally see
>>>>
>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>
>>>> other than the message things seem to be working.
>>>>
>>>> With a custom defconfig I see
>>>>
>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't
>>>> empty!
>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>> ...
>>>>
>>>> and access to the spi-nor does not work until the board is reset.
>>>>
>>>> I'll try and pick apart the differences between the two defconfigs.
>>
>> I now think my earlier testing is invalid. I have seen the problem
>> with either defconfig if I try hard enough. I had convinced myself
>> that the problem was CONFIG_PREEMPT but that was before I found
>> boot-to-boot differences with the same kernel.
>>
>> It's possible that I'm chasing multiple issues with the same symptom.
>>
>> The error I'm most concerned with is in the sequence
>> 1. boot with old image
>> 2. write environment
>> 3. boot with new image
>> 4. write environment
>> 5. write fails and environment is corrupted
>>
>> After I recover the system things sometimes seem fine. Until I repeat
>> the sequence above.
>>
>>> Also relevant may be:
>>> - Which dts are you using?
>> Custom but based heavily on the t2080rdb.
>>> - What's the spi-nor type, and at which frequency are you operating it?
>> The board has several alternate parts for the spi-nor so the dts just
>> specifies compatible = "jedec,spi-nor" the actual chip detected on
>> the board I have is "n25q032a (4096 Kbytes)". The dts sets
>> spi-max-frequency = <10000000> I haven't measured the actual
>> frequency on the bus.
>>> - Does the issue still happen if you lower the frequency?
>> I did play around with the frequency initially but I should probably
>> give that another go now that I have a better reproduction method.
>
> Playing around with the frequency didn't help.
>
> One thing that I've found is that the problem appears to be that I end
> up with extra bytes in the RX FIFO. If I add code to drain the RX FIFO
> then the system is able to keep accessing the spi-nor (albeit with
> some noisy logs).
I've been staring at spi-fsl-espi.c for while now and I think I've
identified a couple of deficiencies that may or may not be related to my
issue.
First I think the 'Transfer done but SPIE_DON isn't set' message can be
generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE register.
We also write back to it to clear the current events. We re-read it in
fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
naturally end up in that situation if we're doing a large read. Consider
the messages for reading a block of data from a spi-nor chip
tx = READ_OP + ADDR
rx = data
We setup the transfer and pump out the tx_buf. The first interrupt goes
off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
clear ESPI_SPIE and wait for the next interrupt. The next interrupt
fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
continues until we've received all the data and we finish with ESPI_SPIE
having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
isn't set.
The other deficiency is that we only get an interrupt when the amount of
data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
FSL_ESPI_RXTHR left to be received we will never pull them out of the fifo.
I think the reason I'm seeing some variability is because of how fast
(or slow) the interrupts get processed and how fast the spi-nor chip can
fill the CPUs rx fifo.