Re: 5.6-rc3: WARNING: CPU: 48 PID: 17435 at kernel/sched/fair.c:380 enqueue_task_fair+0x328/0x440

From: Dietmar Eggemann
Date: Wed Mar 04 2020 - 14:19:42 EST


Hi Christian,

On 04/03/2020 18:42, Christian Borntraeger wrote:
>
>
> On 04.03.20 16:26, Vincent Guittot wrote:
>> On Tue, 3 Mar 2020 at 08:55, Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote:
>>>
>>> On Tue, 3 Mar 2020 at 08:37, Christian Borntraeger
>>> <borntraeger@xxxxxxxxxx> wrote:
>>>>
>>>>
>>>>
>> [...]
>>>>>>> ---
>>>>>>> kernel/sched/fair.c | 2 +-
>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>>>> index 3c8a379c357e..beb773c23e7d 100644
>>>>>>> --- a/kernel/sched/fair.c
>>>>>>> +++ b/kernel/sched/fair.c
>>>>>>> @@ -4035,8 +4035,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>>>>>>> __enqueue_entity(cfs_rq, se);
>>>>>>> se->on_rq = 1;
>>>>>>>
>>>>>>> + list_add_leaf_cfs_rq(cfs_rq);
>>>>>>> if (cfs_rq->nr_running == 1) {
>>>>>>> - list_add_leaf_cfs_rq(cfs_rq);
>>>>>>> check_enqueue_throttle(cfs_rq);
>>>>>>> }
>>>>>>> }
>>>>>>
>>>>>> Now running for 3 hours. I have not seen the issue yet. I can tell tomorrow if this fixes
>>>>>> the issue.
>>>>>
>>>>>
>>>>> Still running fine. I can tell for sure tomorrow, but I have the impression that this makes the
>>>>> WARN_ON go away.
>>>>
>>>> So I guess this change "fixed" the issue. If you want me to test additional patches, let me know.
>>>
>>> Thanks for the test. For now, I don't have any other patch to test. I
>>> have to look more deeply how the situation happens.
>>> I will let you know if I have other patch to test
>>
>> So I haven't been able to figure out how we reach this situation yet.
>> In the meantime I'm going to make a clean patch with the fix above.
>>
>> Is it ok if I add a reported -by and a tested-by you ?
>
> Sure-
> I just realized that this system has something special. Some month ago I created 2 slices
> $ head /etc/systemd/system/*.slice
> ==> /etc/systemd/system/machine-production.slice <==
> [Unit]
> Description=VM production
> Before=slices.target
> Wants=machine.slice
> [Slice]
> CPUQuota=2000%
> CPUWeight=1000
>
> ==> /etc/systemd/system/machine-test.slice <==
> [Unit]
> Description=VM production
> Before=slices.target
> Wants=machine.slice
> [Slice]
> CPUQuota=300%
> CPUWeight=100
>
>
> And the guests are then put into these slices. that also means that this test will never use more than the 2300%.
> No matter how much CPUs the system has.

If you could run this debug patch on top of your un-patched kernel, it would tell us which task (in the enqueue case)
and which taskgroup is causing that.

You could then further dump the appropriate taskgroup directory under the cpu cgroup mountpoint
(to see e.g. the CFS bandwidth data).

I expect more than one hit since assert_list_leaf_cfs_rq() uses SCHED_WARN_ON, hence WARN_ONCE.

--8<--