Re: [RFC PATCH] memcg, oom: throttle dump_header for memcg ooms without eligible tasks

From: Tetsuo Handa
Date: Sat Oct 13 2018 - 07:09:42 EST


On 2018/10/12 21:58, Tetsuo Handa wrote:
> On 2018/10/12 21:41, Johannes Weiner wrote:
>> On Fri, Oct 12, 2018 at 09:10:40PM +0900, Tetsuo Handa wrote:
>>> On 2018/10/12 21:08, Michal Hocko wrote:
>>>>> So not more than 10 dumps in each 5s interval. That looks reasonable
>>>>> to me. By the time it starts dropping data you have more than enough
>>>>> information to go on already.

Not reasonable at all.

>>>>
>>>> Yeah. Unless we have a storm coming from many different cgroups in
>>>> parallel. But even then we have the allocation context for each OOM so
>>>> we are not losing everything. Should we ever tune this, it can be done
>>>> later with some explicit examples.
>>>>
>>>>> Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>
>>>>
>>>> Thanks! I will post the patch to Andrew early next week.
>>>>

One thread from one cgroup is sufficient. I don't think that Michal's patch
is an appropriate mitigation. It still needlessly floods kernel log buffer
and significantly defers recovery operation.

Nacked-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>

---------- Testcase ----------

#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
FILE *fp;
const unsigned long size = 1048576 * 200;
char *buf = malloc(size);
mkdir("/sys/fs/cgroup/memory/test1", 0755);
fp = fopen("/sys/fs/cgroup/memory/test1/memory.limit_in_bytes", "w");
fprintf(fp, "%lu\n", size / 2);
fclose(fp);
fp = fopen("/sys/fs/cgroup/memory/test1/tasks", "w");
fprintf(fp, "%u\n", getpid());
fclose(fp);
fp = fopen("/proc/self/oom_score_adj", "w");
fprintf(fp, "-1000\n");
fclose(fp);
fp = fopen("/dev/zero", "r");
fread(buf, 1, size, fp);
fclose(fp);
return 0;
}

---------- Michal's patch ----------

73133 lines (5.79MB) of kernel messages per one run

[root@ccsecurity ~]# time ./a.out

real 3m44.389s
user 0m0.000s
sys 3m42.334s

[root@ccsecurity ~]# time ./a.out

real 3m41.767s
user 0m0.004s
sys 3m39.779s

---------- My v2 patch ----------

50 lines (3.40 KB) of kernel messages per one run

[root@ccsecurity ~]# time ./a.out

real 0m5.227s
user 0m0.000s
sys 0m4.950s

[root@ccsecurity ~]# time ./a.out

real 0m5.249s
user 0m0.000s
sys 0m4.956s