Re: [BUG] SPI broken for SPI based panel drivers

From: H. Nikolaus Schaller
Date: Fri Dec 04 2020 - 05:11:23 EST


Hi Sven and Linux,

> Am 01.12.2020 um 19:43 schrieb Sven Van Asbroeck <thesven73@xxxxxxxxx>:
>
> Hi Nikolaus,

I was about to submit two patches. One that reverts our "spi-cs-high"
and one that actively sets the GPIO_ACTIVE_LOW because that seems
to be the right thing. It would work with v5.10, but is it really
the right thing?

Before submitting I would like to clarify the rules.

Especially since it is the same discussion as before:

https://lkml.org/lkml/2019/3/23/131
https://lkml.org/lkml/2019/3/24/2

> On Tue, Dec 1, 2020 at 12:13 PM H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> wrote:
>>> Am 01.12.2020 um 17:53 schrieb Sven Van Asbroeck <thesven73@xxxxxxxxx>:
>>> On Tue, Dec 1, 2020 at 11:43 AM H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> wrote:
>>>>
>>>> You are right. It is active low.
>>>>
>>>
>>> In that case, we have a very simple solution, just remove the spi-cs-high,
>>> and things will work.
>>
>> We originally had it that way and because there was a change in gpiolib we had
>> to introduce it.
>
> The current rules re. spi chip-selects in devicetrees are here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpio/gpiolib-of.c?h=v5.10-rc6#n191

Yes, there is

/*
* SPI children have active low chip selects
* by default. This can be specified negatively
* by just omitting "spi-cs-high" in the
* device node, or actively by tagging on
* GPIO_ACTIVE_LOW as flag in the device
* tree. If the line is simultaneously
* tagged as active low in the device tree
* and has the "spi-cs-high" set, we get a
* conflict and the "spi-cs-high" flag will
* take precedence.
*/

It is not complete and a formulated a little in reverse.
A table would would be more precise.

And it is not the complete rule:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpio/gpiolib-of.c?h=v5.10-rc6#n175

/*
* Legacy handling of SPI active high chip select. If we have a
* property named "cs-gpios" we need to inspect the child node
* to determine if the flags should have inverted semantics.
*/

This seems to indicate that the gpio flags can invert spi-cs-high. There are
only two options: invert if they are GPIO_ACTIVE_LOW or GPIO_ACTIVE_HIGH.

This makes IMHO much sense. But it seems that it was not implemented that way.

>
> This is the way I see things:
> - according to the current rules, your devicetree describes a spi panel with
> an active-high chip select

I have now tested all 4 combinations of GPIO_ACTIVE_LOW/HIGH and missing/present
"spi-cs-high" with v5.4.79.
This leads to the following definition:

a) general SPI controller or if no cs-gpios available in spi-gpio

device node | CS pin state active
================+==================
spi-cs-high | H
- | L

This was the single (legacy) rule until v4.18 effectively ignoring any gpio flags.

b) with v4.19 and 6953c57ab172 ("gpio: of: Handle SPI chipselect legacy bindings") was introduced

it was apparently extended to (by code inspection of the patch):

device node | cs-gpio | CS pin state active
================+===============+====================
spi-cs-high | - | H
- | - | L
spi-cs-high | ACTIVE_HIGH | L (polarity inversion, no warning)
- | ACTIVE_HIGH | L + "enforce active low on chipselect handle"
spi-cs-high | ACTIVE_LOW | H + "GPIO handle specifies active low - ignored"
- | ACTIVE_LOW | H (no warning)

But I did test all 4 combinations with v5.4.79 and assuming CS pin
state L if the panel works gave me:

device node | cs-gpio | CS pin state active
================+===============+====================
spi-cs-high | ACTIVE_HIGH | L (panel works)
- | ACTIVE_HIGH | H (panel fails) + "enforce active low on chipselect handle"
spi-cs-high | ACTIVE_LOW | L (panel works) + "GPIO handle specifies active low - ignored"
- | ACTIVE_LOW | H (panel fails without comment)

This means there was some additional tweak to the rules effectively ignoring
the cs-gpio. Like before - but with opposite polarity.

Note that some legacy gpio flags are 0 (ACTIVE_HIGH) so it seems to make sense
for the ACTIVE_HIGH case like ours.

Therefore, only GPIO_ACTIVE_HIGH + "spi-cs-high" works without side-effects which
is the reason why we had modified our device tree and defined the wrong
"spi-cs-high" as a solution.

Anyways it is debatable if this is a bug at all. It is just a definition.
Which is not well documented anywhere.

c) now with v5.10 and 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")

the spi driver does no longer rely on gpiolib to do the right thing but adds
its own rule effectively inverting things again. I hope I got it right because
the logic is now spread over several files:

device node | cs-gpio | CS pin state active
================+===============+====================
spi-cs-high | - | H
- | - | L
spi-cs-high | ACTIVE_HIGH | H (breaks our panel with current DT)
- | ACTIVE_HIGH | L (should print a warning about polarity inversion - because here it would be wise to switch to ACTIVE_LOW)
spi-cs-high | ACTIVE_LOW | H (could print a warning about polarity inversion - because ACTIVE_LOW is overridden by spi-cs-high)
- | ACTIVE_LOW | L

The implementation with your fix now seems to completely ignore the gpio
definition since in the absence of spi-cs-high the panel works with either
ACTIVE_ definition.

Basically, we are back at a) i.e. the same as it was up to v4.18.

So in total there should be no automatic polarity inversion of the flags
by gpiolib at all. Only the spi driver should know that it has to invert
the parameter for ACTIVE_LOW to gpiod_set_value_cansleep() to really control
the CS pin state (H/L).

> - the actual chip select of your panel is active-low

Yes.

> - a spi/gpiod bug inverted the chip-select in many instances

Ok, there was another patch in between:

138c9c32f090 ("spi: spidev: Fix CS polarity if GPIO descriptors are used")

Maybe this has introduced the bug - and we just didn't notice.
This is why I have added Lukas Wunner <lukas@xxxxxxxxx> to the discussion.

> - because of this bug, your devicetree happened to work before 766c6b63aa04

IMHO it is debatable if it is a bug or what is the bug. In any case it
is a gap in clear and binding documentation.

> - 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")
> fixes this chip-select polarity bug
> - you now need to remove your devicetree work-around for this bug by reverting
> f1f028ff89cb0d3

Before doing this I would like to see a table (like mine above) how it
should be. Definitively and finally. Agreed and approved by maintainers.

This table should then go somewhere to the .yaml definitions of gpiod or
spi or whereever it belongs to.

Then you can check if your code does what the table mandates and I can
check if our device tree does the right thing.

Otherwise we risk that I change the DT now and in 2 years someone else
finds that the definition is still not sane and tweaks SPI or gpiolib
again and I have to modify our DT once more (which costs a lot of time
to find the reasons of breakage and takes away a lot of work-time from
other nice things I would like to add to Linux).

What I especially wonder is why you fix that at all in drivers/spi/spi.c
if polarity inversion is handled in gpiolib.

BR and thanks,
Nikolaus