On Thu, Jul 27, 2023 at 10:15:16AM +0200, Michal Hocko wrote:
On Thu 27-07-23 15:36:27, Chuyi Zhou wrote:
This patchset tries to add a new bpf prog type and use it to select
a victim memcg when global OOM is invoked. The mainly motivation is
the need to customizable OOM victim selection functionality so that
we can protect more important app from OOM killer.
This is rather modest to give an idea how the whole thing is supposed to
work. I have looked through patches very quickly but there is no overall
design described anywhere either.
Please could you give us a high level design description and reasoning
why certain decisions have been made? e.g. why is this limited to the
global oom sitation, why is the BPF program forced to operate on memcgs
as entities etc...
Also it would be very helpful to call out limitations of the BPF
program, if there are any.
One thing I realized recently: we don't have to make a victim selection
during the OOM, we [almost always] can do it in advance.
Kernel OOM's must guarantee the forward progress under heavy memory pressure
and it creates a lot of limitations on what can and what can't be done in
these circumstances.
But in practice most policies except maybe those which aim to catch very fast
memory spikes rely on things which are fairly static: a logical importance of
several workloads in comparison to some other workloads, "age", memory footprint
etc.
So I wonder if the right path is to create a kernel interface which allows
to define a OOM victim (maybe several victims, also depending on if it's
a global or a memcg oom) and update it periodically from an userspace.