But that comes with a challenge: despite listening on cgroup for
pressure notifications (which happen from those runtime events we do
not control),
We do also have global pressure (PSI) counters. Have you tried to look
into those and try to back off even when the situation becomes critical?
Yes. PSI counters help us to some extent. But we've found that in some cases
OOM can happen before we observe memory pressure if memory bloat occurred
rapidly. The proposed failsafe mechanism can cover even such a situation.
Also, as I mentioned in commit message, oom notifiers doesn't work if OOM
is triggered by memory allocation for kernel.