Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

From: Vincent Guittot
Date: Wed Apr 11 2018 - 13:00:41 EST


Hi Heiner,

On 9 April 2018 at 19:33, Heiner Kallweit <hkallweit1@xxxxxxxxx> wrote:
> Am 06.04.2018 um 18:03 schrieb Vincent Guittot:
>> Hi Heiner,
>>
>> On 30 March 2018 at 10:37, Heiner Kallweit <hkallweit1@xxxxxxxxx> wrote:
>>> Am 30.03.2018 um 08:50 schrieb Vincent Guittot:
>>>> On 29 March 2018 at 19:40, Heiner Kallweit <hkallweit1@xxxxxxxxx> wrote:
>>>>> Am 29.03.2018 um 09:41 schrieb Vincent Guittot:
>>>>
>>>>>>
>>>>>> I'm finally not so sure that i have the right set up to reproduce the
>>>>>> problem as I haven't been able to reproduce it since.
>>>>>>
>>>>>> Heiner,
>>>>>>
>>>>>> How fast the problem happens on your board ?
>>>>>> Are you doing anything specific on the console that trigger the problem ?
>>>>>>
>>>>> Hi Vincent,
>>>>>
>>>>> the lag when working on the console is constantly there, the "rcu_preempt
>>>>> detected stalls" happens after several hours (so far always within 24h)
>>>>> w/o any triggering event I would be aware of. It occured also when the
>>>>> system was idle at that point in time.
>>>>
>>>> Ok, so I don't have the problem on my hikey as the console never lag
>>>> on my setup.
>>>>
>>>> Can you send me the config of your kernel ? I'd like to check if you
>>>> have enable something that could trigger such problem
>>>>
>>> Sure, he we go. I also add a system log.
>>
>> Thanks for the config. I have used it for my setup but I can't
>> reproduce your regression. My platforms stay stable so I probably
>> missing something. Are you facing similar problem with other platforms
>> or only this celeron based platform ?
>>
>> I have reviewed the code but don't see any obvious place in the patch
>> that can generate the problem. Nevertheless, would you mind to try the
>> patch below ? It's a blind test to try to narrow the problem.
>>
>> Thanks
>>
> Hi Vincent,
>
> I tried again with today's linux-next and it's much better. The lag isn't
> completely gone but it's much less annoying. Every ~30 secs the console
> hangs for about half a second, that's much less frequent than before.

That's interesting because nothing related to commit
31e77c93e432dec79c7d90b888bbfc3652592741 has been merged recently
AFAICT

>
> I saw some patches from Rafael have been merged in the last days.
> Maybe they improved the situation.

Yes, Peter mentions in another thread that lastest Rafael's patches
avoid stopping tick when entering short idle thus reducing the time to
enter idle. commit 31e77 is adding some background activity when
entering idle so it can be that we take too much time

You also mentioned that the CPU was relatively slow on the platform.
Can you try to use cpufreq performance governor instead of ondemand ?

I'm also going to prepare a patch for adding some trace in the code to
highlight the problem

Thanks,
Vincent

>
> Regards, Heiner
>

[snip]

>