Re: [Letux-kernel] [PATCH 5/5] input: twl6040-vibra: remove mutex
From: H. Nikolaus Schaller
Date: Wed Apr 20 2016 - 15:00:49 EST
Hi Dmitry,
> Am 20.04.2016 um 20:15 schrieb Dmitry Torokhov <dmitry.torokhov@xxxxxxxxx>:
>
> On Wed, Apr 20, 2016 at 08:10:28PM +0200, H. Nikolaus Schaller wrote:
>> Hi,
>>
>>> Am 20.04.2016 um 19:49 schrieb Dmitry Torokhov <dmitry.torokhov@xxxxxxxxx>:
>>>
>>> On Wed, Apr 20, 2016 at 11:22:53AM +0200, H. Nikolaus Schaller wrote:
>>>>
>>>>> Am 19.04.2016 um 10:08 schrieb H. Nikolaus Schaller <hns@xxxxxxxxxxxxx>:
>>>>>
>>>>>
>>>>>> Am 19.04.2016 um 10:01 schrieb Dmitry Torokhov <dmitry.torokhov@xxxxxxxxx>:
>>>>>>
>>>>>> On Tue, Apr 19, 2016 at 09:49:01AM +0200, H. Nikolaus Schaller wrote:
>>>>>>>
>>>>>>>> Am 18.04.2016 um 23:20 schrieb Dmitry Torokhov <dmitry.torokhov@xxxxxxxxx>:
>>>>>>>>
>>>>>>>> On Mon, Apr 18, 2016 at 09:55:41PM +0200, H. Nikolaus Schaller wrote:
>>>>>>>>> The mutex does not seem to be needed.
>>>>>>>>
>>>>>>>> twl6040_vibra_suspend() and vibra_play_work() may run concurrently, no?
>>>>>>>
>>>>>>> Hm. I don't know about the rule that would give an answer to this question...
>>>>>>
>>>>>> Sorry, that was actually a statement, not really a question.
>>>>>
>>>>> Indeed. In doubt about the answer we should take measures for the worst case.
>>>>>
>>>>>> It is
>>>>>> possible (although very unlikely) that userspace posts play request and
>>>>>> workqueue will not run until after suspend callback.
>>>>>>
>>>>>> Thinking about it some more I wonder if we better do what
>>>>>> twl6040_vibra_close() does and cancel the work before shutting off the
>>>>>> device, so that there is no chance of work executing after suspend
>>>>>> callback and reenabling the device. This way we can indeed remove the
>>>>>> mutex.
>>>>>
>>>>> Ok, I am fine with this.
>>>>>
>>>>> Will post an update ASAP.
>>>>
>>>> While doing testing I did run again into another locking related issue which I
>>>> had not yet tried to address with my patch set. It is a:
>>>>
>>>> BUG: scheduling while atomic
>>>>
>>>> report which sometimes ends in a kernel panic.
>>>>
>>>> I have attached such a log. vibra.py is here:
>>>>
>>>> http://git.goldelico.com/?p=gta04-kernel.git;a=blob;f=Letux/root/vibra.py;hb=refs/heads/letux-4.6-rc4
>>>>
>>>> Basically it does an ioctl(EVIOCSFF) which triggers vibra_play.
>>>>
>>>> Maybe, can you decipher from the log what the reason could be?
>>>>
>>>> I only conjecture that it happens when vibra_play tries to read the regmap
>>>> of the twl6040 to get twl6040_get_vibralr_status (which has no pendant
>>>> in the twl4030 driver).
>>>>
>>>> Do we have to configure the twl6040 regmap differently?
>>>
>>> Right, vibra_play is called with interrupts disabled (that's why we have
>>> workqueue to enable/disable regulators, etc, when we start or stop
>>> vibration), so twl6040_get_vibralr_status() should not sleep, but
>>> apparently it does.
>>
>> Yes, regmap using i2c communication may sleep.
>>
>>> Maybe the check for audio configuration should be
>>> moved into vibra_play_work().
>>
>> Hm. It is there to disable while in audio routing mode, but
>> a workqueue can't report error values back to the scheduling thread...
>
> Nothing checks result of ->play() anyways, so that -EBUSY was completly
> useless.
Ah, indeed:
http://lxr.free-electrons.com/source/drivers/input/ff-memless.c#L410
>
>>
>> So we can either silently make vibra not work (or just report
>> in the kernel log) if "Vibra is configured for audio".
>
> We might want to ratelimit that message.
Or not report it at all.
>
>>
>> Or we need to get a different mechanism to know the vibra status.
>>
>> Hm.
>>
>> Research shows that it regmap was introduced long ago but between 3.11 and
>> 3.12 a private cache for these control registers was removed.
>>
>> http://lxr.free-electrons.com/source/drivers/mfd/twl6040.c?v=3.11#L66
>> http://lxr.free-electrons.com/source/drivers/mfd/twl6040.c?v=3.11#L85
>> http://lxr.free-electrons.com/source/drivers/mfd/twl6040.c?v=3.11#L462
>>
>> This had been non-blocking and was probably there exactly to protect
>> against this locking issue!
>>
>> After removing the private cache the code must rely on twl6040_get_vibralr_status
>> not fetching from the chip.
>>
>> Ah, this correlates to that I see this issue only once and then everything
>> works.
>>
>> This means we have to fetch the current vibra control registers once
>> outside of vibra_play(). Probably during probing by a single call to
>> twl6040_get_vibralr_status() and ignoring the result.
>>
>> After it has been fetched once (to know any status from the last reboot)
>> the regmap should track all changes arriving through the sound subsystem
>> (audio vibra enable) and the call to twl6040_get_vibralr_status in interrupt
>> context should no longer block.
>>
>> Does this sound reasonable?
I think if the -EBUSY isn't evaluated at all, it is simpler to move this test
to the worker. This has the benefit of avoiding a potential race between
changing the vibra source selection (e.g. through amixer) and running the
work function.
>
> Yes, as long as you document this so that it does not get removed by
> mistake later.
I will prepare a patch and test asap.
BR and thanks,
Nikolaus