Re: Regression: at24 eeprom writing times out on sama5d3

From: Linux regression tracking (Thorsten Leemhuis)
Date: Wed Mar 15 2023 - 07:58:50 EST


Hi, Thorsten here, the Linux kernel's regression tracker. Seems this
regression is still unfixed (please correct me if I'm wrong), so I'm
back with another comment:

On 15.12.22 19:50, Conor.Dooley@xxxxxxxxxxxxx wrote:
> On 15/12/2022 17:53, Thorsten Leemhuis wrote:
>> On 08.09.22 15:59, Peter Rosin wrote:
>>> 2022-09-08 at 14:06, Thorsten Leemhuis wrote:
>>>>
>>>> Peter, Codrin, could you help me out here please: I still have the
>>>> regression report from Peter that started this thread in the list of
>>>> tracked issues. From Peter's last msg quoted below it seems the thread
>>>> just faded out without the regression being fixed. Or was it? If not:
>>>> what can we do to finally get this resolved?
>>>
>>> No, it is not resolved that I know of. We are only writing during
>>> production, but are working around it by verifying and looping back.
>>> Sometimes it takes surprisingly long for the loop to finish, but
>>> it's not a huge deal. But it is of course not completely satisfying
>>> either...
>>>
>>> Reading is never a problem, so post-production behavior is sane.
>>
>> I still have this regression that Peter reported in late July on my
>> list. :-(
>>
>> Codrin (and maybe Wolfram), could you provide a update please? Afaics
>> this is the state of things (please correct me if I'm wrong!): In an
>> earlier mail
>> (https://lore.kernel.org/lkml/38dedc92-62a2-7365-6fda-95d6404be749@xxxxxxxxxx/
>> ) of this thread Peter stated that the following patch set Codrin posted
>> mid 2021 helped:
>> https://lore.kernel.org/all/20210727111554.1338832-1-codrin.ciubotariu@xxxxxxxxxxxxx/
>
> IIRC (and I may well be wrong as it is not my neck of the woods) Codrin is
> no longer at Microchip. Nicolas, do you know who has taken over this driver?

Nicolas didn't reply afaics, but I just found he in
https://lore.kernel.org/all/176099e2-cbff-1987-f59a-2ca618a9c92a@xxxxxxxxxxxxx/
mentioned that Codrin left.

Did anyone else take over his duties and that patchset? Or should I file
this under "regressions that were bisected[1], but nevertheless fixed"?
I'd hate to do that when patches to resolve it are actually available
and got stuck in review...

[1] to a change from Kamel Bouhara iirc

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke


>> There where a few review comments from Wolfram, but looks like then
>> things stalled. Can we somehow get this rolling again to finally get
>> this regression fixed?
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>
>> P.S.: As the Linux kernel's regression tracker I deal with a lot of
>> reports and sometimes miss something important when writing mails like
>> this. If that's the case here, don't hesitate to tell me in a public
>> reply, it's in everyone's interest to set the public record straight.
>>
>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>>
>>>> P.S.: As the Linux kernel's regression tracker I deal with a lot of
>>>> reports and sometimes miss something important when writing mails like
>>>> this. If that's the case here, don't hesitate to tell me in a public
>>>> reply, it's in everyone's interest to set the public record straight.
>>>>
>>>> On 30.06.22 09:44, Peter Rosin wrote:
>>>>> 2022-06-10 at 22:51, Peter Rosin wrote:
>>>>>> 2022-06-10 at 09:35, Codrin.Ciubotariu@xxxxxxxxxxxxx wrote:
>>>>>>> On 09.06.2022 17:28, Peter Rosin wrote:
>>>>>>>>
>>>>>>>> I have not actually bisected this issue but reverting the effects of
>>>>>>>> patch a4bd8da893a3 ("ARM: dts: at91: sama5d3: add i2c gpio pinctrl")
>>>>>>>> makes the problem go away.
>>>>>>>>
>>>>>>>> I.e. I need something like this in my dts
>>>>>>>>
>>>>>>>> &i2c2 {
>>>>>>>> status = "okay";
>>>>>>>>
>>>>>>>> pinctrl-names = "default";
>>>>>>>> /delete-property/ pinctrl-1;
>>>>>>>> /delete-property/ sda-gpios;
>>>>>>>> /delete-property/ scl-gpios;
>>>>>>>>
>>>>>>>> eeprom@50 {
>>>>>>>> compatible = "st,24c64", "atmel,24c64";
>>>>>>>> reg = <0x50>;
>>>>>>>> wp-gpios = <&filter_gpio 7 GPIO_ACTIVE_HIGH>;
>>>>>>>> };
>>>>>>>> };
>>>>>>>>
>>>>>>>> for multi-page eeprom writes to not time out (a page is 32 bytes on this
>>>>>>>> eeprom).
>>>>>>>>
>>>>>>>> For reference, the current defaults for this SoC/I2C-bus, that I modify,
>>>>>>>> are:
>>>>>>>>
>>>>>>>> pinctrl-names = "default", "gpio";
>>>>>>>> pinctrl-0 = <&pinctrl_i2c2>;
>>>>>>>> pinctrl-1 = <&pinctrl_i2c2_gpio>;
>>>>>>>> sda-gpios = <&pioA 18 GPIO_ACTIVE_HIGH>;
>>>>>>>> scl-gpios = <&pioA 19 (GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN)>;
>>>>>>>>
>>>>>>>> I suspect that the underlying reason is that the bus recovery takes
>>>>>>>> too long and that the at24 eeprom driver gives up prematurely. I doubt
>>>>>>>> that this is chip specific, but I don't know that.
>>>>>>>>
>>>>>>>> I can work around the issue in user space with by writing in 4 byte
>>>>>>>> chunks, like so
>>>>>>>>
>>>>>>>> dd if=source.file of=/sys/bus/i2c/devices/2-0050/eeprom obs=4
>>>>>>>>
>>>>>>>> but that is really ugly and gets slow too, about 20 seconds to program
>>>>>>>> the full 8kB eeprom. With the above in my dts it takes a second or
>>>>>>>> so (a bit more with dynamic debug active).
>>>>>>>>
>>>>>>>>
>>>>>>>> If I run
>>>>>>>>
>>>>>>>> dd if=source.file of=/sys/bus/i2c/devices/2-0050/eeprom
>>>>>>>>
>>>>>>>> with a source.file of 8kB and the upstream dts properties in place, I can
>>>>>>>> collect the following debug output from at24, i2c-core and i2c-at91:
>>>>>>>>
>>>>>>>> Jun 9 15:56:34 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: transfer complete
>>>>>>>> Jun 9 15:56:34 me20 kernel: at24 2-0050: write 32@0 --> 0 (-23170)
>>>>>>>> Jun 9 15:56:34 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: received nack
>>>>>>>> Jun 9 15:56:34 me20 kernel: i2c i2c-2: Trying i2c bus recovery
>>>>>>>> Jun 9 15:56:34 me20 kernel: at24 2-0050: write 32@32 --> -121 (-23169)
>>>>>>>> Jun 9 15:56:34 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: transfer complete
>>>>>>>> Jun 9 15:56:34 me20 kernel: at24 2-0050: write 32@32 --> 0 (-23168)
>>>>>>>> Jun 9 15:56:34 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: received nack
>>>>>>>> Jun 9 15:56:34 me20 kernel: i2c i2c-2: Trying i2c bus recovery
>>>>>>>> Jun 9 15:56:34 me20 kernel: at24 2-0050: write 32@64 --> -121 (-23168)
>>>>>>>> Jun 9 15:56:34 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: transfer complete
>>>>>>>> Jun 9 15:56:34 me20 kernel: at24 2-0050: write 32@64 --> 0 (-23167)
>>>>>>>> Jun 9 15:56:34 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: received nack
>>>>>>>> Jun 9 15:56:34 me20 kernel: i2c i2c-2: Trying i2c bus recovery
>>>>>>>> Jun 9 15:56:34 me20 kernel: at24 2-0050: write 32@96 --> -121 (-23167)
>>>>>>>> Jun 9 15:56:34 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: controller timed out
>>>>>>>> Jun 9 15:56:34 me20 kernel: i2c i2c-2: Trying i2c bus recovery
>>>>>>>> Jun 9 15:56:34 me20 kernel: at24 2-0050: write 32@96 --> -110 (-23155)
>>>>>>>> Jun 9 15:56:34 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:56:34 me20 kernel: at91_i2c f801c000.i2c: controller timed out
>>>>>>>> Jun 9 15:56:34 me20 kernel: i2c i2c-2: Trying i2c bus recovery
>>>>>>>> Jun 9 15:56:34 me20 kernel: at24 2-0050: write 32@96 --> -110 (-23143)
>>>>>>>>
>>>>>>>> And then there is no more action. I.e. only a couple of 32 byte pages
>>>>>>>> are written.
>>>>>>>>
>>>>>>>> With the above mentioned dts override in place I instead get this, which is
>>>>>>>> a lot more sensible:
>>>>>>>>
>>>>>>>> Jun 9 15:48:53 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer complete
>>>>>>>> Jun 9 15:48:53 me20 kernel: at24 2-0050: write 32@0 --> 0 (753629)
>>>>>>>> Jun 9 15:48:53 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: received nack
>>>>>>>> Jun 9 15:48:53 me20 kernel: at24 2-0050: write 32@32 --> -121 (753629)
>>>>>>>> Jun 9 15:48:53 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer complete
>>>>>>>> Jun 9 15:48:53 me20 kernel: at24 2-0050: write 32@32 --> 0 (753630)
>>>>>>>> Jun 9 15:48:53 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: received nack
>>>>>>>> Jun 9 15:48:53 me20 kernel: at24 2-0050: write 32@64 --> -121 (753630)
>>>>>>>> Jun 9 15:48:53 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer complete
>>>>>>>> Jun 9 15:48:53 me20 kernel: at24 2-0050: write 32@64 --> 0 (753631)
>>>>>>>> Jun 9 15:48:53 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: received nack
>>>>>>>> Jun 9 15:48:53 me20 kernel: at24 2-0050: write 32@96 --> -121 (753631)
>>>>>>>> Jun 9 15:48:53 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer complete
>>>>>>>> Jun 9 15:48:53 me20 kernel: at24 2-0050: write 32@96 --> 0 (753632)
>>>>>>>> Jun 9 15:48:53 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: received nack
>>>>>>>> Jun 9 15:48:53 me20 kernel: at24 2-0050: write 32@128 --> -121 (753632)
>>>>>>>> Jun 9 15:48:53 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer complete
>>>>>>>> Jun 9 15:48:53 me20 kernel: at24 2-0050: write 32@128 --> 0 (753633)
>>>>>>>> Jun 9 15:48:53 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: received nack
>>>>>>>> Jun 9 15:48:53 me20 kernel: at24 2-0050: write 32@160 --> -121 (753633)
>>>>>>>> Jun 9 15:48:53 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:53 me20 kernel: at91_i2c f801c000.i2c: transfer complete
>>>>>>>> Jun 9 15:48:53 me20 kernel: at24 2-0050: write 32@160 --> 0 (753634)
>>>>>>>> ... snip ...
>>>>>>>> Jun 9 15:48:55 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:55 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:55 me20 kernel: at91_i2c f801c000.i2c: received nack
>>>>>>>> Jun 9 15:48:55 me20 kernel: at24 2-0050: write 32@8128 --> -121 (753883)
>>>>>>>> Jun 9 15:48:55 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:55 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:55 me20 kernel: at91_i2c f801c000.i2c: transfer complete
>>>>>>>> Jun 9 15:48:55 me20 kernel: at24 2-0050: write 32@8128 --> 0 (753884)
>>>>>>>> Jun 9 15:48:55 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:55 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:55 me20 kernel: at91_i2c f801c000.i2c: received nack
>>>>>>>> Jun 9 15:48:55 me20 kernel: at24 2-0050: write 32@8160 --> -121 (753884)
>>>>>>>> Jun 9 15:48:55 me20 kernel: i2c i2c-2: at91_xfer: processing 1 messages:
>>>>>>>> Jun 9 15:48:55 me20 kernel: at91_i2c f801c000.i2c: transfer: write 34 bytes.
>>>>>>>> Jun 9 15:48:55 me20 kernel: at91_i2c f801c000.i2c: transfer complete
>>>>>>>> Jun 9 15:48:55 me20 kernel: at24 2-0050: write 32@8160 --> 0 (753885)
>>>>>>>
>>>>>>> could you please apply this patch-set [1] and let us know if it
>>>>>>> addresses your issue?
>>>>>>>
>>>>>>> Thanks and best regards,
>>>>>>> Codrin
>>>>>>>
>>>>>>> https://patchwork.ozlabs.org/project/linux-i2c/list/?series=255408
>>>>>>
>>>>>> That series does indeed help! I'll reply with a tested-by etc on the
>>>>>> first two patches, I can't test patch 3/3 with my sama5d3 board...
>>>>>>
>>>>>> Thank you very much!
>>>>>
>>>>> Since replying to the actual patches do not work for me, I'm writing here
>>>>> instead. Sorry about that. As stated above, it /seems/ to work much better
>>>>> with these patches. But I fooled myself and there is still some remaining
>>>>> trouble. It is not uncommon that the second (32-byte) page in the eeprom
>>>>> is not written correctly for whatever reason. I do not know why it's
>>>>> always the second page that gets corrupted, but this is a bad problem since
>>>>> the failure is completely silent.
>>>>>
>>>>> Cheers,
>>>>> Peter
>>>>
>>>> #regzbot poke
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>> #regzbot poke
>