Re: Sluggish AT91 I2C driver causes SMBus timeouts

From: Peter Rosin
Date: Thu Oct 12 2017 - 10:32:26 EST


On 2017-10-12 13:35, Peter Rosin wrote:
> Hi!
>
> I have encountered an "interesting" bug. It silently corrupts data
> and is generally nasty...
>
> On an I2C bus, driven by the at91 driver and DMA (an Atmel
> sama5d31 chip), I have an 256 byte eeprom (NXP SE97BTP). I'm using
> Linux v4.13.
>
> The at24 driver for the eeprom detects that the I2C adapter is
> capable of I2C transactions and selects that over SMBus. Reads
> are done in 128 byte chunks. However, sometimes there is some
> kind of event that disturbs the transactions such that the very
> last bit och the very last byte (and the following NACK and STOP)
> of such chunks are delayed for a long time (the latest incident
> shows 85ms on the scope). That is too long for the eeprom which
> is expecting SMBus and times out after about 30 ms. When the
> eeprom times out, it just releases the data line so that it is
> pulled up high. The I2C driver does not notice this, and when it
> finally gets going, it reads a one for the last bit instead of
> the expected zero. Since it is the last byte of the read, a NACK
> is expected and since the eeprom has timed out the NACK is there.
> And the STOP condition also looks normal (expected since it is
> generated by the driver itself). So, the driver has not noticed
> anything funny. But the data is corrupted.
>
> I can work around this by disabling the SMBus timeout in the eeprom
> with:
>
> i2cset -f 0 0x18 0x22 0x8100
>
> But that is done on a different I2C address (the eeprom is on
> address 0x50), since the chip is a combined temperature sensor and
> eeprom, and the SMBus timeout bit is of course in a temperature
> sensor register.
>
> HOWEVER, I fail to see how this is limited to my case with this
> eeprom. Any SMBus chip with a timeout will suffer the same fate.
> The real bug is that this happens without the driver noticing it.
> And why is there a 85ms delay in the middle of the last byte?
> Sure, I can see why there might be a delay before finishing up
> with a STOP condition or between bytes if there needs to be some
> DMA setup at some interval, but after the 7th bit of a byte?
>
> For a lot of transactions on the I2C bus there is no delay before
> the last bit. And most of the time there is no delay for the
> eeprom reads either; the delay only occurs when it feels like it.
>
> This does not feel good at all.

I added some traces to i2c-at91.c and, AFAIU, it's the call to
at91_twi_read_data_dma_callback that sometimes arrives later than
desired. Once the callback runs, the transfer completes swiftly.

After reading the comments in that driver I suppose the HW holds
on to the last data-bit until it knows whether to ACK or NACK in
the following bit.

But given this, it is questionable if this driver/HW combo can
claim support for SMBus. But then again, I expect many things
suffer from similar scheduling delays (presumably that's what's
going on) so this is probably not special to i2c-at91.c...

Since this is probably a very generic problem and I just happened
to hit it for the eeprom, I wonder if it would be ok to add a
workaround, as below, to the temperature sensor driver part of this
chip? (with suitable comments, defines for the constants etc -
setting the 0x0080 bit in reg 0x22 disables the SMBus timeout)

Cheers,
Peter

diff --git a/drivers/hwmon/jc42.c b/drivers/hwmon/jc42.c
index 1bf22eff0b08..3e72bd8e06d1 100644
--- a/drivers/hwmon/jc42.c
+++ b/drivers/hwmon/jc42.c
@@ -416,6 +416,13 @@ static int jc42_detect(struct i2c_client *client, struct i2c_board_info *info)
if ((cap & 0xff00) || (config & 0xf800))
return -ENODEV;

+ if (manid == NXP_MANID && (devid & SE97_DEVID_MASK) == SE97_DEVID) {
+ int smbus = i2c_smbus_read_word_swapped(client, 0x22);
+ if (smbus < 0)
+ return -ENODEV;
+ i2c_smbus_write_word_swapped(client, 0x22, smbus | 0x0080);
+ }
+
for (i = 0; i < ARRAY_SIZE(jc42_chips); i++) {
struct jc42_chips *chip = &jc42_chips[i];
if (manid == chip->manid &&