Re: [PATCH v3] mm: memcontrol: Don't flood OOM messages with no eligible task.

From: Tetsuo Handa
Date: Fri Oct 19 2018 - 06:36:01 EST


On 2018/10/19 8:54, Sergey Senozhatsky wrote:
> On (10/18/18 20:58), Tetsuo Handa wrote:
>>>
>>> A knob might do.
>>> As well as /proc/sys/kernel/printk tweaks, probably. One can even add
>>> echo "a b c d" > /proc/sys/kernel/printk to .bashrc and adjust printk
>>> console levels on login and rollback to old values in .bash_logout
>>> May be.
>>
>> That can work for only single login with root user case.
>> Not everyone logs into console as root user.
>
> Add sudo ;)

That will not work. ;-) As long as the console loglevel setting is
system wide, we can't allow multiple login sessions.

>
>> It is pity that we can't send kernel messages to only selected consoles
>> (e.g. all messages are sent to netconsole, but only critical messages are
>> sent to local consoles).
>
> OK, that's a fair point. There was a patch from FB, which would allow us
> to set a log_level on per-console basis. So the noise goes to heav^W net
> console; only critical stuff goes to the serial console (if I recall it
> correctly). I'm not sure what happened to that patch, it was a while ago.
> I'll try to find that out.

Per a console loglevel setting would help for several environments.
But syzbot environment cannot count on netconsole. We can't expect that
unlimited printk() will become safe.

>
> [..]
>> That boils down to a "user interaction" problem.
>> Not limiting
>>
>> "%s invoked oom-killer: gfp_mask=%#x(%pGg), nodemask=%*pbl, order=%d, oom_score_adj=%hd\n"
>> "Out of memory and no killable processes...\n"
>>
>> is very annoying.
>>
>> And I really can't understand why Michal thinks "handling this requirement" as
>> "make the code more complex than necessary and squash different things together".
>
> Michal is trying very hard to address the problem in a reasonable way.

OK. But Michal, do we have a reasonable way which can be applied now instead of
my patch or one of below patches? Just enumerating words like "hackish" or "a mess"
without YOU ACTUALLY PROPOSE PATCHES will bounce back to YOU.

> The problem you are talking about is not MM specific. You can have a
> faulty SCSI device, corrupted FS, and so and on.

"a faulty SCSI device, corrupted FS, and so and on" are reporting problems
which will complete a request. They can use (and are using) ratelimit,
aren't they?

"a memcg OOM with no eligible task" is reporting a problem which cannot
complete a request. But it can use ratelimit as well.

But we have an immediately applicable mitigation for a problem that
already OOM-killed threads are triggering "a memcg OOM with no eligible
task" using one of below patches.