Re: Regression: at24 eeprom writing

From: Nicolas Ferre
Date: Tue Oct 13 2015 - 08:58:38 EST


Le 13/10/2015 12:38, Peter Rosin a écrit :
> On 2015-10-12 18:13, Cyrille Pitchen wrote:
>> Le 12/10/2015 17:13, Peter Rosin a écrit :
>>> On 2015-10-05 17:09, Peter Rosin wrote:
>>>> But what trouble does the i2c bus driver see? Admittedly I only
>>>> have a simple logic level bus viewer, and not a full-blown
>>>> oscilloscope, so there might be something analogue going on?
>>>> I don't think so though, those signals looked fine last time we
>>>> looked (but we obviously didn't have these issues then, and
>>>> didn't really look that closely). I'll see if I can recheck
>>>> with a real scope too.
>>>
>>> We redid the tests with a real scope, and the signal looks nice
>>> and square, so it is not that.
>>>
>>> Speculating further on the cause of the long ACKs, I think that
>>> the i2c driver gets confused by an interrupt that marks the
>>> transfer complete, and thinks it's a NACK interrupt instead. Is that
>>> plausible?
>>>
>>> If the peripheral unit is such that it generates a stop automatically
>>> on NACKs, then this makes perfect sense. I.e. the TWI sees that the
>>> transfer is complete, generates an interrupt, and waits for further
>>> data or a stop command. Meanwhile the driver thinks it's a NACK and
>>> that a stop condition has already been sent to the bus, and just
>>> notifies the i2c consumer (the eeprom driver in this case) of the
>>> failure and frees up the bus for any future user.
>>>
>>> This also matches what I see when I turn on some more traffic on the
>>> bus, that is interleaved with the eeprom traffic. AFAICT, it can be
>>> any command that gets chewed up by the eeprom if it is sent to the
>>> i2c driver during the long ACK.
>>>
>>> Are you Atmel people making any progress on this data corrupting
>>> regression? Is there anything else I can do to help?
>>>
>>> Cheers,
>>> Peter
>>>
>>
>> Hi Peter,
>>
>> I have sent a patch to Ludovic for a first internal review before publishing to
>> mainline. The patch should fix your issue since it fixes it on my sama5d36ek
>> board with an at24 eeprom.
>>
>> More details on the reason of this bug would be provided in both the commit
>> message and comments in the code provided by the reviewed patch but I you want
>> an early fix just read the Status Register (AT91_TWI_SR) at the beginning of
>> at91_do_twi_transfer(). This read clears the NACK bit in the Status Register.
>> Then the following source code can safely enable the NACK interrupt, otherwise
>> in some cases a pending NACK interrupt would rise immediately after the line:
>> at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_NACK);
>> hence breaking the sequence of operations to be done because the interrupt
>> handler would call complete() too early so wait_for_completion_timeout()
>> also exits too early.
>>
>> So reading the Status Register at the beginning of at91_do_twi_transfer()
>> should be enough to fix the issue.
>
> Yes, I see no more long ACKs after that reading the Status Register there.
> Great!
>
>> Another mistake is in the interrupt handler itself, ie atmel_twi_interrupt():
>> we should check the TWI_TXRDY status bit before calling
>> at91_twi_write_next_byte() only if both the TWI_TXCOMP and TWI_NACK status bits
>> are clear. Otherwise, writing a new byte into the THR tells the I2C controller
>> to start a new transfer. Then the I2C slave, the at24 eeprom, is likely to
>> also reply by a second NACK. Hence the NACK bit is already set into the Status
>> Register on the next call of at91_do_twi_transfer().
>> This is what I saw on my scope for PIO transfers.
>
> I interpret this as a proposed solution for the strange double NACKs?
>
> Anyway, I find it unnecessarily hard to grasp exactly what you mean
> (wasteful policy you are apparently suffering from where it is OK to
> publish a patch written in English, but apparently a big no-no to
> send a diff until it passes some internal review???). I interpreted

I find your remark pretty rude as I'm reading it while just at the desk
behind me Cyrille and Ludovic are together trying hard to understand and
fix this issue.
They are making sure that this fix doesn't interfere with another SoC's
IP version with any of the PIO/DMA/FIFO transfer types.

Cyrille just wanted to keep you informed quickly as we were
progressing... Anyone can tell you that, obviously, there is no stupid
policy of not releasing patches @ Atmel!

Anyway, let's keep the good debugging session progressing well with your
valuable help...

Bye.


> your "patch" in English as:
>
> at91_twi_read_next_byte(dev);
> - else if (irqstatus & AT_TWI_TXRDY)
> + else if ((irqstatus & (AT91_TWI_TXCOMP | AT91_TWI_TXRDY | AT91_TWI_NACK)) == AT91_TWI_TXRDY)
> at91_twi_write_next_byte(dev);
>
> But still see some double NACKs. Not always though, and it doesn't wreak
> any havoc. But it still looks strange and I can't explain them when looking
> at what the eeprom driver requests. Does this mean that there are more
> races present?
>
> Or, did I just parse your English "patch" badly?
>
>> By the way, in my case, the first NACK occurs because the at24 driver tries to
>> perform a second write transfer too quickly and the eeprom is not ready yet,
>> then replies with a NACK.
>
> Yes, I believe this is by design. Noone wants to encode the exact delays
> needed since they are hard to predict. So, the eeprom driver polls.
>
>> However while debugging, placing debug traces in the I2C controller driver
>> changed the timing: this small delay was enough to make the first NACK
>> disappear because the eeprom was now ready to process the second write
>> operation.
>
> Yes, w/o NACKs everything is going smooth regardless of kernel version.
>
>> For DMA transfer, you have to know that when NACK error is detected, the I2C
>> controller sets the TWI_TXCOMP, TWI_NACK and TWI_TXRDY bits all together in
>> the Status Register:
>> - The TWI_NACK is cleared by reading the Status Register.
>> - The TWI_TXRDY triggers the DMA controller to write the next byte in the THR,
>> which should also tell the I2C controller to start a new transfer.
>> - The TWI_TXCOMP bit is cleared by writing into the THR: this is why the
>> interrupt handler may fail to detect NACK if it relies on TWI_TXCOMP.
>>
>> Best Regards,
>>
>> Cyrille
>>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>


--
Nicolas Ferre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/