Re: [PATCH 2/3] fpga manager: xilinx-spi: provide better diagnostics on programming failure

From: Luca Ceresoli
Date: Wed Aug 19 2020 - 12:32:36 EST


On 18/08/20 16:21, Tom Rix wrote:
>
> On 8/18/20 3:20 AM, Luca Ceresoli wrote:
>> [a question for GPIO maintainers below]
>>
>> Hi Tom,
>>
>> thanks for your review!
>>
>> On 17/08/20 20:15, Tom Rix wrote:
>>> The other two patches are fine.
>>>
>>> On 8/17/20 9:59 AM, Luca Ceresoli wrote:
>>>> When the DONE pin does not go high after programming to confirm programming
>>>> success, the INIT_B pin provides some info on the reason. Use it if
>>>> available to provide a more explanatory error message.
>>>>
>>>> Signed-off-by: Luca Ceresoli <luca@xxxxxxxxxxxxxxxx>
>>>> ---
>>>> drivers/fpga/xilinx-spi.c | 11 ++++++++++-
>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/fpga/xilinx-spi.c b/drivers/fpga/xilinx-spi.c
>>>> index 502fae0d1d85..2aa942bb1114 100644
>>>> --- a/drivers/fpga/xilinx-spi.c
>>>> +++ b/drivers/fpga/xilinx-spi.c
>>>> @@ -169,7 +169,16 @@ static int xilinx_spi_write_complete(struct fpga_manager *mgr,
>>>> return xilinx_spi_apply_cclk_cycles(conf);
>>>> }
>>>>
>>>> - dev_err(&mgr->dev, "Timeout after config data transfer.\n");
>>>> + if (conf->init_b) {
>>>> + int init_b_asserted = gpiod_get_value(conf->init_b);
>>> gpiod_get_value can fail. So maybe need split the first statement.
>>>
>>> init_b_asserted < 0 ? "invalid device"
>>>
>>> As the if-else statement is getting complicated, embedding the ? : makes this hard to read.  'if,else if, else' would be better.
>> Thanks for the heads up. However I'm not sure which is the best thing to
>> do here.
>>
>> First, I've been reading the libgpiod code after your email and yes, the
>> libgpiod code _could_ return runtime errors received from the gpiochip
>> driver, even though the docs state:
>>
>>> The get/set calls do not return errors because “invalid GPIO”> should have been reported earlier from gpiod_direction_*().
>> (https://www.kernel.org/doc/html/latest/driver-api/gpio/consumer.html)
>>
>> On the other hand there are plenty of calls to gpiod_get/set_value in
>> the kernel that don't check for error values. I guess this is because
>> failures getting/setting a GPIO are very uncommon (perhaps impossible
>> with platform GPIO).
>>
>> When still a GPIO get/set operation fails I'm not sure adding thousands
>> of error-checking code lines in hundreds of drivers is the best way to
>> go. I feel like we should have a unique, noisy dev_err() in the error
>> path in libgpio but I was surprised in not finding any [1].
>>
>> Linus, Bartosz, what's your opinion? Should all drivers check for errors
>> after every gpiod_[sg]et_value*() call?
>
> My opinion is that you know the driver / hw is in a bad state and you
>
> are trying to convey useful information.  So you should
>
> be as careful as possible and not assume gpio did not fail.

This patch aims at providing better diagnostics after programming has
already gone bad. Neglecting an error might lead to a misleading error
message, but this doesn't lead programming to fail -- it has failed already.

On the other hand a gpiod_get/set_value() call might fail earlier, along
the normal execution path, and lead to real failures without an error
message emitted after the gpiod call that failed.

Which doesn't mean I'm against your proposal of adding error checking
code. Rather, if we want error checking, we want it mainly in other
places: at the very least at the first usage of each of the GPIOs, maybe
at each usage. Have a look at the beginning of
xilinx_spi_write_complete() [0] for example: if gpiod_get_value() fails
there the driver would think programming has been successfully completed
(DONE asserted). To me this is worse than just printing the wrong error
message.

[0]
https://elixir.bootlin.com/linux/v5.8.2/source/drivers/fpga/xilinx-spi.c#L114

--
Luca