Re: [PATCH v3] platform/chrome: Use proper protocol transfer function

From: Shawn N
Date: Tue Sep 19 2017 - 13:03:14 EST


On Tue, Sep 19, 2017 at 9:39 AM, Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
>
>
> On 19/09/17 15:09, Shawn N wrote:
>> On Tue, Sep 19, 2017 at 6:44 AM, Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
>>>
>>> Hi Brian,
>>>
>>> On 08/09/17 21:50, Brian Norris wrote:
>>>> From: Shawn Nematbakhsh <shawnn@xxxxxxxxxxxx>
>>>>
>>>> pkt_xfer should be used for protocol v3, and cmd_xfer otherwise. We had
>>>> one instance of these functions correct, but not the second, fall-back
>>>> case. We use the fall-back only when the first command returns an
>>>> IN_PROGRESS status, which is only used on some EC firmwares where we
>>>> don't want to constantly poll the bus, but instead back off and
>>>> sleep/retry for a little while.
>>>>
>>>> Fixes: 2c7589af3c4d ("mfd: cros_ec: add proto v3 skeleton")
>>>> Signed-off-by: Shawn Nematbakhsh <shawnn@xxxxxxxxxxxx>
>>>> Signed-off-by: Brian Norris <briannorris@xxxxxxxxxxxx>
>>>> Reviewed-by: Javier Martinez Canillas <javier@xxxxxxxxxxxxxxx>
>>>> ---
>>>> v3:
>>>> * Added Javier's reviewed tag
>>>> * It's been > 8 months since [1], so why not? And hey, Benson's officially in
>>>> MAINTAINERS now! Too bad no one told me.
>>>>
>>>> [1] https://patchwork.kernel.org/patch/9450633/
>>>>
>>>>
>>>> v2:
>>>> * Add Benson in 'To:'
>>>> * make subject prefix more obvious
>>>>
>>>> drivers/platform/chrome/cros_ec_proto.c | 8 +++++---
>>>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c
>>>> index 8dfa7fcb1248..e7bbdf947bbc 100644
>>>> --- a/drivers/platform/chrome/cros_ec_proto.c
>>>> +++ b/drivers/platform/chrome/cros_ec_proto.c
>>>> @@ -60,12 +60,14 @@ static int send_command(struct cros_ec_device *ec_dev,
>>>> struct cros_ec_command *msg)
>>>> {
>>>> int ret;
>>>> + int (*xfer_fxn)(struct cros_ec_device *ec, struct cros_ec_command *msg);
>>>>
>>>> if (ec_dev->proto_version > 2)
>>>> - ret = ec_dev->pkt_xfer(ec_dev, msg);
>>>> + xfer_fxn = ec_dev->pkt_xfer;
>>>> else
>>>> - ret = ec_dev->cmd_xfer(ec_dev, msg);
>>>> + xfer_fxn = ec_dev->cmd_xfer;u
>>>>
>>>> + ret = (*xfer_fxn)(ec_dev, msg);
>>>> if (msg->result == EC_RES_IN_PROGRESS) {
>>>> int i;
>>>> struct cros_ec_command *status_msg;
>>>> @@ -88,7 +90,7 @@ static int send_command(struct cros_ec_device *ec_dev,
>>>> for (i = 0; i < EC_COMMAND_RETRIES; i++) {
>>>> usleep_range(10000, 11000);
>>>>
>>>> - ret = ec_dev->cmd_xfer(ec_dev, status_msg);
>>>> + ret = (*xfer_fxn)(ec_dev, status_msg);
>>>> if (ret < 0)
>>>> break;
>>>>
>>>
>>> Tegra124 Nyan-Big is currently crashing during boot with -next [0] and
>>> bisect is pointing to this commit. Reverting the above on top of -next
>>> does allow the board to boot successfully. Looks like this board is
>>> proto_version 3 but I have not looked into this any further. Let me know
>>> if you have any thoughts.
>>
>>
>> Thanks for the bug report, I'll look into this today.
>>
>>> [ 1.502497] kernel BUG at drivers/platform/chrome/cros_ec_proto.c:34!
>>> 34 BUG_ON(ec_dev->proto_version != EC_HOST_REQUEST_VERSION);
>>
>> So, ec_dev->proto_version > 3? That doesn't seem right.
>
> You mean != 3, but yes. Looks like an initialisation problem, because if I
> add the following WARNING ...

I meant > 3 because we check for > 2 in send_command().

>
> diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c
> index e7bbdf947bbc..ad3b3a1e8d54 100644
> --- a/drivers/platform/chrome/cros_ec_proto.c
> +++ b/drivers/platform/chrome/cros_ec_proto.c
> @@ -31,6 +31,7 @@ static int prepare_packet(struct cros_ec_device *ec_dev,
> int i;
> u8 csum = 0;
>
> + WARN(ec_dev->proto_version != EC_HOST_REQUEST_VERSION, "%d != %d", ec_dev->proto_version, EC_HOST_REQUEST_VERSION);
> BUG_ON(ec_dev->proto_version != EC_HOST_REQUEST_VERSION);
> BUG_ON(msg->outsize + sizeof(*request) > ec_dev->dout_size);
>
> ... then I see ...
>
> [ 1.502495] WARNING: CPU: 0 PID: 1 at drivers/platform/chrome/cros_ec_proto.c:35 cros_ec_prepare_tx+0x190/0x1a8
> [ 1.512566] 65535 != 3
>
> Any chance this is being called before the version is initialised?

It's initialized in cros_ec_query_all().

Considering your trace shows (send_command+0x20/0xd8) as a caller, I'm
guessing that we die on the first call to (*xfer_fxn):

+ ret = (*xfer_fxn)(ec_dev, msg);

That part of the change should be a NOP, I only added a function
pointer so we wouldn't have to re-check protocol_version later. The
syntax looks fine to me even after re-checking, but maybe I missed
something. Let me test on my side.

>
> Cheers
> Jon
>
> --
> nvpublic