Re: [PATCH v3] i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low

From: Yann Sionneau
Date: Tue Aug 22 2023 - 06:28:28 EST


Hi,

On 8/22/23 12:14, Jarkko Nikula wrote:
Hi

On 8/22/23 12:02, Yann Sionneau wrote:
The DesignWare IP can be synthesized with the IC_EMPTYFIFO_HOLD_MASTER_EN
parameter.
In this case, when the TX FIFO gets empty and the last command didn't have
the STOP bit (IC_DATA_CMD[9]), the controller will hold SCL low until
a new command is pushed into the TX FIFO or the transfer is aborted.

When the controller is holding SCL low, it cannot be disabled.
The transfer must first be aborted.
Also, the bus recovery won't work because SCL is held low by the master.

Check if the master is holding SCL low in __i2c_dw_disable() before trying
to disable the controller. If SCL is held low, an abort is initiated.
When the abort is done, then proceed with disabling the controller.

This whole situation can happen for instance during SMBus read data block
if the slave just responds with "byte count == 0".
This puts the driver in an unrecoverable state, because the controller is
holding SCL low and the current __i2c_dw_disable() procedure is not
working. In this situation only a SoC reset can fix the i2c bus.

Is this fixed already by the commit 69f035c480d7 ("i2c: designware: Handle invalid SMBus block data response length value")?

https://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git/commit/?h=i2c/for-next&id=69f035c480d76f12bf061148ccfd578e1099e5fc

Indeed the bug that I am having is fixed by 69f035c480d76f12bf061148ccfd578e1099e5fc

Meaning that thanks to 69f035c receiving a SMBus byte count of 0 won't put the controller into a state where the completion will timeout and it will need to start a recovery that will fail and then a controller disabling that will also fail.

But, still, the disabling procedure is wrong, it lacks the abort part (in case SCL is held low).

What my patch does, is fixing the disabling procedure. So that - for example - even without 69f035c, the controller will timeout when receiving byte count of 0, triggering the "disabling" procedure which will work to recover the bus.

My patch fixes the general disabling code, that could be triggered by the bug fixed by 69f035c but also by any other bug really.

Speaking of 69f035c btw ... I am really wondering if it's the best fix, because it will lie to the kernel saying "we have byte count of 1, read another byte" to trigger a read with STOP bit set so that the transfer does finish and the controller does not timeout. But to do this it will do an extra spurious byte read.

I propose another approach that will acknowledge that "we are in a bad condition" and directly abort the transfer without doing an extra read: https://marc.info/?l=linux-i2c&m=169175828013532&w=2

I hope my answer to your question is clear enough... English is not my native language.

Regards,

--

Yann