Re: [PATCH] i2c: at91: fix write transfers by clearing pending interrupt first

From: Peter Rosin
Date: Wed Oct 21 2015 - 03:42:50 EST


On 2015-10-21 09:21, Peter Rosin wrote:
> On 2015-10-20 15:27, Ludovic Desroches wrote:
>> On Mon, Oct 19, 2015 at 12:49:03PM +0200, Peter Rosin wrote:
>>> On 2015-10-19 10:51, Ludovic Desroches wrote:
>>>> Hi Peter,
>>>>
>>>> On Fri, Oct 16, 2015 at 11:08:42AM +0200, Peter Rosin wrote:
>>>>> On 2015-10-16 01:47, Peter Rosin wrote:
>>>>>> On 2015-10-14 07:43, Ludovic Desroches wrote:
>>>>>>> On Tue, Oct 13, 2015 at 08:01:34PM +0200, Peter Rosin wrote:
>>>>>>>> On 2015-10-13 18:47, Cyrille Pitchen wrote:
>>>>>>>>> Le 13/10/2015 17:19, Peter Rosin a écrit :
>>>>>>>>>> On 2015-10-13 16:21, Ludovic Desroches wrote:
>>>>
>>>> [...]
>>>>
>>>>>> I have started to get this when I run with this patch:
>>>>>>
>>>>>> [ 73.310000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>>> [ 198.200000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>>> [ 509.880000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>>> [ 615.750000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>>> [ 617.750000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>>> [ 1766.640000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>>> [ 2035.380000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>>> [ 2227.190000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>>> [ 2313.100000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>>>
>>>>>> My USB serial dongle was hung which was why I didn't notice until just now.
>>>>>>
>>>>>> This is probably not when communication with the eeprom though, and certainly not
>>>>>> writing to it, but perhpaps when polling the temperature (using the jc42 driver).
>>>>>> I'll investigate further in the morning to see if I can pinpoint it.
>>>>>
>>>>> Yep, it's the jc42 accesses that triggers this (to the same chip as the
>>>>> eeprom, but a different block of transistors I suppose).
>>>>>
>>>>> Looking at the i2c bus, the accesses normally go like this:
>>>>>
>>>>> [0.000000] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
>>>>> [0.000521] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
>>>>> [0.001024] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
>>>>> [0.001524] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
>>>>> [0.196991] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
>>>>> [0.197514] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
>>>>> [0.198019] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
>>>>> [0.198520] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
>>>>>
>>>>> I.e. chunks of four transfers every ~200 ms (I removed the 1Hz rate
>>>>> limiter in the jc42 driver to get more frequent incidents).
>>>>>
>>>>> But when the diagnostic (RXRDY still set!) is output it continues
>>>>> with this:
>>>>>
>>>>> [0.399755] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
>>>>> [0.404998] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
>>>>> [0.405508] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
>>>>> [0.406008] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
>>>>>
>>>>> Notice the ~5 ms delay (about the time it should take to output
>>>>> the diagnostic at 115200 baud) after the access to register 0x05
>>>>> at 0.399755.
>>>>>
>>>>> This is the only thing that I can observe on the bus, so it appears
>>>>> to be harmless.
>>>>>
>>>>> It appears that the i2c access at 0.399755 finds the TWI
>>>>> registers in an odd state, but nothing from the access at
>>>>> 0.198520 appears to have gone wrong. Is this a race? Anyway,
>>>>> the diagnostic is pretty frequent and annoying. printk_once?
>>>>
>>>> I'll try to reproduce it on my side. The only issue you have is having
>>>> the message about RXRDY? I mean no issue with i2c transfers?
>>>
>>> Exactly, the only issue is the message, the bus looks good and transfers
>>> work as they should AFAICT.
>>>> It is not the possible bug we had in mind, this bug will prevent the
>>>> master device to release the i2c bus. It will stop the transfer but
>>>> without sending a stop on the bus.
>>>
>>> Agreed, this is not about the extra frame caused by the spurious write
>>> to the THR register. This is something else.
>>>
>>> One suspicion is that the driver gets an unexpected irq from its own
>>> NACK (the one that it puts out to end the read) and races with the
>>> expected interrupt at TXCOMP?
>>>
>>
>> I have discussed with the designer of this IP. So the NACK flag is only
>> used for 'real' nack not protocol ones.
>>
>> Concerning the read issue, it have attached a patch (apply it on top of
>> Cyrille's patch). Could you have a try? I have reproduced your issue only one
>> time so it's hard for me to tell if it works well or not.
>
> Since you removed the dev_err call, I don't really know if the underlying
> issue is solved or not (if there ever was any).
>
> Reading RHR in the if at the top of at91_twi_read_next_byte() never happens
> for me (I added a witness and left it running overnight), so that
> seems like an unrelated change.

Ahhh, now I see. Why is it that I have to hit send before I start to realize
things? You have moved the RHR read from at91_do_twi_transfer()...

> Checking RXRDY before TXCOMP|NACK seems about equal, but what do I know?

...and there is no else before the TXCOMP|NACK, so I added a witness inside the
if (...RXRDY) that triggers if TXCOM|NACK is also set. That witness fires at
about the same rate as the "RXRDY still set" message did before. For thrills I
readded the if (sr & AT91_TWI_RXRDY) check in at91_do_twi_transfer() but that
test is no longer firing.

So, yes, this is a good fix...

> But the "problem" is gone of course, no flood of messages, but I can't tell
> if this is just papering over or a fix.

...and you can scratch that comment.

Cheers,
Peter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/