Re: [PATCH v3] platform/chrome: Use proper protocol transfer function

From: Doug Anderson
Date: Wed Nov 08 2017 - 11:45:34 EST


Hi,

On Wed, Nov 8, 2017 at 2:20 AM, Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
> Hi Doug,
>
> On 07/11/17 17:22, Doug Anderson wrote:
>> On Tue, Nov 7, 2017 at 3:28 AM, Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
>>> On 10/10/17 17:52, Doug Anderson wrote:
>>>
>>> ...
>>>
>>>>> I'm still not clear on why we see an error only on the first
>>>>> transaction after boot. In this case, the embedded controller
>>>>> previously handled host commands from firmware just fine, and the
>>>>> handoff between firmware and the kernel shouldn't be significant, from
>>>>> the EC point of view. If host command timing is consistent from the
>>>>> master, I would expect to see some constant error rate, eg. some
>>>>> chance any host command will fail, rather than the first host command
>>>>> always failing.
>>>>
>>>> The AP itself is often quite busy at boot and so the timings for
>>>> everything change. That could easily explain the problems.
>>>
>>> Sorry for the delay, but I have finally had some time to look at this a
>>> bit closer. I have been able to track down where the additional delay is
>>> really needed and seems to explain what is going on.
>>>
>>> For starters, the SPI chip-select is under h/w control and so the
>>> software delay has no impact on the timing between the chip-select
>>> going active and the transaction starting as I had first thought.
>>>
>>> I found that a delay is needed between completing the probe the Tegra
>>> SPI device and the first SPI transaction issued to the EC. In the Tegra
>>> SPI probe the SPI controller is programmed for master mode, but at the
>>> same time it clears the chip-select inactive polarity bit meaning that
>>> initially the SPI chip-select default to active-high (rather that low
>>> which seems odd). I believe that this then drives the chip-select low
>>> (active for the EC) and until it is then configured when spi_setup() is
>>> called which configures it as active-low for the EC.
>>> To get the first transaction to work for the EC there needs to be a
>>> delay after we program the chip-select polarity when spi_setup() is
>>> called. For example ...
>>>
>>> diff --git a/drivers/mfd/cros_ec_spi.c b/drivers/mfd/cros_ec_spi.c
>>> index 584367f3a0ed..c1075c3c60c8 100644
>>> --- a/drivers/mfd/cros_ec_spi.c
>>> +++ b/drivers/mfd/cros_ec_spi.c
>>> @@ -648,6 +648,8 @@ static int cros_ec_spi_probe(struct spi_device *spi)
>>> if (err < 0)
>>> return err;
>>>
>>> + udelay(100);
>>> +
>>
>> This isn't totally crazy, but actually you could probably do this:
>>
>> ec_spi->last_transfer_ns = ktime_get_ns();
>>
>> ...that will leverage already existing code and constants and also
>> will avoid doing a delay if it wasn't needed. You could also then get
>> rid of some "if (ec_spi->last_transfer_ns)" tests in the code. I'd
>> support landing that.
>>
>>
>>> ec_spi = devm_kzalloc(dev, sizeof(*ec_spi), GFP_KERNEL);
>>> if (ec_spi == NULL)
>>> return -ENOMEM;
>>>
>
>
> OK, yes and that does work well too. I will send a patch with the
> following for review.
>
> diff --git a/drivers/mfd/cros_ec_spi.c b/drivers/mfd/cros_ec_spi.c
> index 584367f3a0ed..477f8e81dc34 100644
> --- a/drivers/mfd/cros_ec_spi.c
> +++ b/drivers/mfd/cros_ec_spi.c
> @@ -671,6 +671,7 @@ static int cros_ec_spi_probe(struct spi_device *spi)
> sizeof(struct ec_response_get_protocol_info);
> ec_dev->dout_size = sizeof(struct ec_host_request);
>
> + ec_spi->last_transfer_ns = ktime_get_ns();
>
> err = cros_ec_register(ec_dev);
> if (err) {

That's sufficient to make it work, but now that "last_transfer_ns" can
never be 0 you should also fix that. It's hard to paste a patch in
gmail and the change is trivial so I didn't try to find another way to
post it, but roughly:

- * @last_transfer_ns: time that we last finished a transfer, or 0 if there
- * if no record
+ * @last_transfer_ns: time that we last finished a transfer.

===

+ unsigned long delay; /* The delay completed so far */

len = cros_ec_prepare_tx(ec_dev, ec_msg);
dev_dbg(ec_dev->dev, "prepared, len=%d\n", len);

/* If it's too soon to do another transaction, wait */
- if (ec_spi->last_transfer_ns) {
- unsigned long delay; /* The delay completed so far */
-
- delay = ktime_get_ns() - ec_spi->last_transfer_ns;
- if (delay < EC_SPI_RECOVERY_TIME_NS)
- ndelay(EC_SPI_RECOVERY_TIME_NS - delay);
- }
+ delay = ktime_get_ns() - ec_spi->last_transfer_ns;
+ if (delay < EC_SPI_RECOVERY_TIME_NS)
+ ndelay(EC_SPI_RECOVERY_TIME_NS - delay);

===

+ unsigned long delay; /* The delay completed so far */

len = cros_ec_prepare_tx(ec_dev, ec_msg);
dev_dbg(ec_dev->dev, "prepared, len=%d\n", len);

/* If it's too soon to do another transaction, wait */
- if (ec_spi->last_transfer_ns) {
- unsigned long delay; /* The delay completed so far */
-
- delay = ktime_get_ns() - ec_spi->last_transfer_ns;
- if (delay < EC_SPI_RECOVERY_TIME_NS)
- ndelay(EC_SPI_RECOVERY_TIME_NS - delay);
- }
+ delay = ktime_get_ns() - ec_spi->last_transfer_ns;
+ if (delay < EC_SPI_RECOVERY_TIME_NS)
+ ndelay(EC_SPI_RECOVERY_TIME_NS - delay);


>>> You may say why not put a delay in the tegra_spi_setup() itself, but we
>>> have some other SPI devices such as flash devices which are also use an
>>> active-low chip-select which don't have any issues with this to date.
>>> Furthermore, this delay is also probably device specific.
>>>
>>> From an EC perspective, if the chip-select is asserted is there a
>>> turnaround time before it can be asserted again? Or in this case maybe
>>> the issue is that the chip-select is asserted but no transaction occurs
>>> before it is de-asserted and so is causing a problem. Please note that a
>>> delay of around ~50us above still fails but 100us seems to be ok.
>>
>> Really nice detective work!
>>
>>
>>> Finally, this also works-around the problem by avoiding the chip-select
>>> from being pulsed low by defaulting all chip-selects to active-low but
>>> maybe this just masks the problem.
>>>
>>> diff --git a/drivers/spi/spi-tegra114.c b/drivers/spi/spi-tegra114.c
>>> index a76acedd7e2f..7c18204e61d9 100644
>>> --- a/drivers/spi/spi-tegra114.c
>>> +++ b/drivers/spi/spi-tegra114.c
>>> @@ -1117,7 +1117,7 @@ static int tegra_spi_probe(struct platform_device
>>> goto exit_pm_disable;
>>> }
>>> - tspi->def_command1_reg = SPI_M_S;
>>> + tspi->def_command1_reg = SPI_M_S | SPI_CS_POL_INACTIVE_MASK;
>>
>> Note that even though I'd support adding some sort of delay to the
>> cros_ec driver, I'd also say that it _might_ make sense to mess with
>> the SPI driver too, just to avoid glitching the lines at bootup. As
>> you said, you shouldn't just willy nilly change the default but it
>> seems like it could make sense to define an initial (board level)
>> pinctrl state that's in effect until the first call to setup_bus().
>
> I am thinking about making that above change as well, because the reset
> value of the chip-select polarity bits default to active-low. For
> active-high devices I don't think that you can ever avoid the
> chip-select asserting for a period after reset as that is the default
> setting, but I don't see why we are clearing these bits in probe.

Is the default muxing option for these pins to be SPI? If not then
you'd have to look at the default muxing option and possibly the
default pull option.


> I will do a bit more testing with this to avoid any regressions, but
> both changes seem worthwhile IMO.

It does seem slightly risky to change the default for all boards, but
I'll leave it up to you. I think you can get this right with all the
pinctrl stuff I mentioned, but that definitely would be a lot more
work. Also: if you land the cros ec change, it's not exactly
urgent...

-Doug