Re: [PATCH 0/2] gpiolib: Fix issues introduced by fast bitmap processing path

From: Janusz Krzysztofik
Date: Mon Sep 24 2018 - 10:18:21 EST


Hi Marek,

2018-09-24 13:38 GMT+02:00, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>:
> Hi Janusz,
>
> On 2018-09-24 13:08, Janusz Krzysztofik wrote:
>> 2018-09-24 11:43 GMT+02:00, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>:
>>> On 2018-09-24 01:53, Janusz Krzysztofik wrote:
>>>> While investigating possible reasons of GPIO fast bitmap processing
>>>> related boot hang on Samsung Snow Chromebook, reported by Marek
>>>> Szyprowski (thanks!), I've discovered one coding bug, addressed by
>>>> PATCH 1/2 of this series, and one potential regression introduced at
>>>> design level of the solution, hopefully fixed by PATCH 2/2. See
>>>> commit messages for details.
>>>>
>>>> Janusz Krzysztofik (2):
>>>> gpiolib: Fix missing updates of bitmap index
>>>> gpiolib: Fix array members of same chip processed separately
>>>>
>>>> The fixes should resolve the boot hang observed by Marek, however the
>>>> second change excludes that particular case from fast bitmap processing
>>>> and restores the old behaviour.
>>> I confirm, that the above 2 patches fixes boot issue on Samsung Snow
>>> Chromebook with next-20180920.
>>>
>>> Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
>>>
>>>> Hence, it is possible still another
>>>> issue which have had an influence on that boot hang exists in the code.
>>>> In order to fully verify the fix, it would have to be tested on a
>>>> platform where an array of GPIO descriptors is used which starts from
>>>> at least two consecutive pins of one GPIO chip in hardware order,
>>>> starting ftom 0, followed by one or more pins belonging to other
>>>> chip(s).
>>>>
>>>> In order to verify if separate calls to .set() chip callback for each
>>>> pin instead of one call to .set_multiple() is actually the reason of
>>>> boot hang on Samsung Snow Chromebook, the affected driver -
>>>> drivers/mmc/core/pwrseq_simple.c - would have to be temporarily
>>>> modified for testing purposes so it calls gpiod_set_value() for each
>>>> pin instead of gpiod_set_array_value() for all of them. If that would
>>>> also result in boot hang, we could be sure the issue was really the
>>>> one addressed by the second fix. Marek, could you please try to
>>>> perform such test?
>>> Yes, I've just tested next-20180920 only with the first patch from this
>>> patchset and the mentioned change to drivers/mmc/core/pwrseq_simple.c.
>>> It boots fine, so indeed the issue is in handling of arrays of gpios.
>>>
>>> Just to be sure I did it right, this is my change to the mentioned file:
>> Yeah, that's what I had on mind. However, I'd be more lucky if it didn't
>> work
>> for you. Setting the pins sequentially, not simultaneously as before,
>> was
>> exactly what I hoped was the reason of the hang.
>>
>>> diff --git a/drivers/mmc/core/pwrseq_simple.c
>>> b/drivers/mmc/core/pwrseq_simple.c
>>> index 7f882a2bb872..9397dc1f2e38 100644
>>> --- a/drivers/mmc/core/pwrseq_simple.c
>>> +++ b/drivers/mmc/core/pwrseq_simple.c
>>> @@ -38,16 +38,11 @@ static void mmc_pwrseq_simple_set_gpios_value(struct
>>> mmc_pwrseq_simple *pwrseq,
>>> int value)
>>> {
>>> struct gpio_descs *reset_gpios = pwrseq->reset_gpios;
>>> + int i;
>>>
>>> - if (!IS_ERR(reset_gpios)) {
>>> - DECLARE_BITMAP(values, BITS_PER_TYPE(value));
>>> - int nvalues = reset_gpios->ndescs;
>>> -
>>> - values[0] = value;
>>> -
>>> - gpiod_set_array_value_cansleep(nvalues,
>>> reset_gpios->desc,
>>> - reset_gpios->info,
>>> values);
>>> - }
>>> + if (!IS_ERR(reset_gpios))
>>> + for (i = 0; i < reset_gpios->ndescs; i++)
>> The only difference from the behaviour when the hang was occurring is now
>> the order the pins are manipulated. Maybe that matters?
>> Could you please retry the same with the order of pins reversed, either
>> in
>> the .dts file or here inside this for loop?
>
> I've switched the order of pins in dts and next-20180920 + first patch +
> above
> change also boots fine.

Thanks for performing those tests.

Since we are not able to reproduce the issue by any means other than
using the original code introduced by fast bitmap processing changes,
regardless of the first fix being applied or not, and we are only able to
resolve the hangup by excluding affected use case from the fast path,
we have to assume one or more bugs which affect mixed arrays, i.e.,
those which apply for fast bitmap processing only in part, may still exist
in the code introduced by the fast bitmap processing series. I hope we
are able to resolve it soon, before the changes reach mainline.

Thanks,
Janusz


> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>
>