Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

From: Shayan Pooya
Date: Fri Jul 15 2016 - 12:59:09 EST


>> I am just curious... can you reproduce the problem reliably? If yes, can you try
>> the patch below ? Just in case, this is not the real fix in any case...
>
> Yes. It deterministically results in hung processes in vanilla kernel.
> I'll try this patch.

I'll have to correct this. I can reproduce this issue easily on
high-end servers and normal laptops. But for some reason it does not
happen very often in vmware guests (maybe related to lower
parallelism).

>> --- x/kernel/sched/core.c
>> +++ x/kernel/sched/core.c
>> @@ -2793,8 +2793,11 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
>> balance_callback(rq);
>> preempt_enable();
>>
>> - if (current->set_child_tid)
>> + if (current->set_child_tid) {
>> + mem_cgroup_oom_enable();
>> put_user(task_pid_vnr(current), current->set_child_tid);
>> + mem_cgroup_oom_disable();
>> + }
>> }
>>
>> /*

I tried this patch and I still see the same stuck processes (assuming
that's what you were curious about).