Re: [PATCH v3] platform/chrome: Use proper protocol transfer function

From: Jon Hunter
Date: Tue Sep 26 2017 - 11:43:58 EST



On 26/09/17 00:15, Shawn N wrote:
> On Wed, Sep 20, 2017 at 1:22 PM, Shawn N <shawnn@xxxxxxxxxx> wrote:
>> On Tue, Sep 19, 2017 at 11:13 PM, Brian Norris <briannorris@xxxxxxxxxxxx> wrote:
>>> Hi,
>>>
>>> On Tue, Sep 19, 2017 at 11:05:38PM -0700, Shawn N wrote:
>>>> This is failing because our EC_CMD_GET_PROTOCOL_INFO host command is
>>>> getting messed up, or the reply buffer is getting corrupted somehow.
>>>>
>>>> ec_dev->proto_version =
>>>> min(EC_HOST_REQUEST_VERSION,
>>>> fls(proto_info->protocol_versions) - 1);
>>>>
>>
>> Checking this closer, the first host command we send after we boot the
>> kernel (EC_CMD_GET_PROTOCOL_INFO) is failing due to protocol error
>> (see 'SPI rx bad data' / 'SPI not ready' on the EC console). Since
>> this doesn't seem to happen on the Chromium OS nyan_big release
>> kernel, I suggest to hook up a logic analyzer and see if the SPI
>> master is doing something bad.
>>
>> The error handling in cros_ec_cmd_xfer_spi() is completely wrong and
>> we return -EAGAIN / EC_RES_IN_PROGRESS, which the caller interprets
>> "the host command was received by the EC and is currently being
>> handled, poll status until completion". So the caller polls status
>> with EC_CMD_GET_COMMS_STATUS, sees no host command is in progress
>> (which is interpreted to mean "the host command I sent previously has
>> now successfully completed"), and returns success. The problem here is
>> that the initial host command was never received at all, and no reply
>> was ever received, so our reply data is all zero.
>>
>> Two things need to be fixed here:
>>
>> 1) Find out why the first host command after boot is failing. Probe
>> SPI pins and see what's going on.

Yes, I will see if I can look into this.

>> 2) Fix error handling so we properly return an error (or properly
>> retry the entire command) when a protocol error occurs (I made some
>> attempt in https://chromium-review.googlesource.com/385080/, probably
>> I should revisit that).
>
> The below patch will fix error handling and will make things mostly
> work on nyan_big, because we'll fall back to V2 protocol after the
> initial failure. But we should still investigate why we're getting
> errors on the first host command. We aren't seeing these errors when
> we send commands from firmware, so I suspect something is wrong in
> kernel SPI HW initialization that causes the first command to fail.
>
> From: Shawn Nematbakhsh <shawnn@xxxxxxxxxxxx>
> Date: Mon, 25 Sep 2017 14:32:38 -0700
> Subject: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling
>
> For host commands that take a long time to process, cros ec can return
> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
>
> None of the above applies when data link errors are encountered. When
> errors such as EC_SPI_PAST_END are encountered during command
> transmission, it usually means the command was not received by the EC.
> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
> almost always the wrong decision, and can result in host commands
> silently being lost.
>
> Signed-off-by: Shawn Nematbakhsh <shawnn@xxxxxxxxxxxx>
> ---
> drivers/mfd/cros_ec_spi.c | 26 ++++++++++++--------------
> 1 file changed, 12 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/mfd/cros_ec_spi.c b/drivers/mfd/cros_ec_spi.c
> index c9714072e224..d33e3847e11e 100644
> --- a/drivers/mfd/cros_ec_spi.c
> +++ b/drivers/mfd/cros_ec_spi.c
> @@ -377,6 +377,7 @@ static int cros_ec_pkt_xfer_spi(struct
> cros_ec_device *ec_dev,
> u8 *ptr;
> u8 *rx_buf;
> u8 sum;
> + u8 rx_byte;
> int ret = 0, final_ret;
>
> len = cros_ec_prepare_tx(ec_dev, ec_msg);
> @@ -421,25 +422,22 @@ static int cros_ec_pkt_xfer_spi(struct
> cros_ec_device *ec_dev,
> if (!ret) {
> /* Verify that EC can process command */
> for (i = 0; i < len; i++) {
> - switch (rx_buf[i]) {
> - case EC_SPI_PAST_END:
> - case EC_SPI_RX_BAD_DATA:
> - case EC_SPI_NOT_READY:
> - ret = -EAGAIN;
> - ec_msg->result = EC_RES_IN_PROGRESS;
> - default:
> + rx_byte = rx_buf[i];
> + if (rx_byte == EC_SPI_PAST_END ||
> + rx_byte == EC_SPI_RX_BAD_DATA ||
> + rx_byte == EC_SPI_NOT_READY) {
> + ret = -EREMOTEIO;
> break;
> }
> - if (ret)
> - break;
> }
> - if (!ret)
> - ret = cros_ec_spi_receive_packet(ec_dev,
> - ec_msg->insize + sizeof(*response));
> - } else {
> - dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
> }
>
> + if (!ret)
> + ret = cros_ec_spi_receive_packet(ec_dev,
> + ec_msg->insize + sizeof(*response));
> + else
> + dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
> +
> final_ret = terminate_request(ec_dev);
>
> spi_bus_unlock(ec_spi->spi->master);
>

Thanks! Works for me ...

Tested-by: Jon Hunter <jonathanh@xxxxxxxxxx>

Cheers
Jon

--
nvpublic