Re: [PATCH] platform/chrome: cros_ec_proto: Send command again when timeout occurs

From: Greg KH
Date: Tue May 18 2021 - 05:29:36 EST


On Tue, May 18, 2021 at 11:09:25AM +0200, Patryk Duda wrote:
> Sometimes kernel is trying to probe Fingerprint MCU (FPMCU) when it
> hasn't initialized SPI yet. This can happen because FPMCU is restarted
> during system boot and kernel can send message in short window
> eg. between sysjump to RW and SPI initialization.
>
> Cc: <stable@xxxxxxxxxxxxxxx> # 4.4+
> Signed-off-by: Patryk Duda <pdk@xxxxxxxxxxxx>
> ---
> Fingerprint MCU is rebooted during system startup by AP firmware (coreboot).
> During cold boot kernel can query FPMCU in a window just after jump to RW
> section of firmware but before SPI is initialized. The window was
> shortened to <1ms, but it can't be eliminated completly.
>
> Communication with FPMCU (and all devices based on EC) is bi-directional.
> When kernel sends message, EC will send EC_SPI* status codes. When EC is
> not able to process command one of bytes will be eg. EC_SPI_NOT_READY.
> This mechanism won't work when SPI is not initailized on EC side. In fact,
> buffer is filled with 0xFF bytes, so from kernel perspective device is not
> responding. To avoid this problem, we can query device once again. We are
> already waiting EC_MSG_DEADLINE_MS for response, so we can send command
> immediately.
>
> Best regards,
> Patryk
> drivers/platform/chrome/cros_ec_proto.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c
> index aa7f7aa77297..3384631d21e2 100644
> --- a/drivers/platform/chrome/cros_ec_proto.c
> +++ b/drivers/platform/chrome/cros_ec_proto.c
> @@ -279,6 +279,18 @@ static int cros_ec_host_command_proto_query(struct cros_ec_device *ec_dev,
> msg->insize = sizeof(struct ec_response_get_protocol_info);
>
> ret = send_command(ec_dev, msg);
> + /*
> + * Send command once again when timeout occurred.
> + * Fingerprint MCU (FPMCU) is restarted during system boot which
> + * introduces small window in which FPMCU won't respond for any
> + * messages sent by kernel. There is no need to wait before next
> + * attempt because we waited at least EC_MSG_DEADLINE_MS.
> + */
> + if (ret == -ETIMEDOUT) {
> + dev_warn(ec_dev->dev,
> + "Timeout to get response from EC. Retrying.\n");

If a user sees this, what can they do? No need to spam the kernel logs,
just retry.

> + ret = send_command(ec_dev, msg);

But wait, why just retry once? Why not 10 times? 100? 1000? How
about a simple loop here instead, with a "sane" number of retries as a
max.

thanks,

greg k-h