Re: [v8 0/4] cgroup-aware OOM killer

From: Shakeel Butt
Date: Mon Oct 02 2017 - 15:00:52 EST

> Yes and nobody is disputing that, really. I guess the main disconnect
> here is that different people want to have more detailed control over
> the victim selection while the patchset tries to handle the most
> simplistic scenario when a no userspace control over the selection is
> required. And I would claim that this will be a last majority of setups
> and we should address it first.

IMHO the disconnect/disagreement is which memcgs should be compared
with each other for oom victim selection. Let's forget about oom
priority and just take size into the account. Should the oom selection
algorithm, compare the leaves of the hierarchy or should it compare
siblings? For the single user system, comparing leaves makes sense
while in a multi user system, siblings should be compared for victim

Coming back to the same example:

/ \
/ \

Let's view it as a multi user system and some central job scheduler
has asked a node controller on this system to start two jobs 'A' &
'D'. 'A' then went on to create sub-containers. Now, on system oom,
IMO the most simple sensible thing to do from the semantic point of
view is to compare 'A' and 'D' and if 'A''s usage is higher then
killall 'A' if oom_group or recursively find victim memcg taking 'A'
as root.

I have noted before that for single user systems, comparing 'B', 'C' &
'D' is the most sensible thing to do.

Now, in the multi user system, I can kind of force the comparison of
'A' & 'D' by setting oom_group on 'A'. IMO that is abuse of
'oom_group' as it will get double meanings/semantics which are
comparison leader and killall. I would humbly suggest to have two
separate notions instead. Let's say oom_gang (if you prefer just
'oom_group' is fine too) and killall.

For the single user system example, 'B', 'C' and 'D' will have
'oom_gang' set and if the user wants killall semantics too, he can set
it separately.

For the multi user, 'A' and 'D' will have 'oom_gang' set. Now, lets
say 'A' was selected on system oom, if 'killall' was set on 'A' then
'A' will be selected as victim otherwise the oom selection algorithm
will recursively take 'A' as root and try to find victim memcg.

Another major semantic of 'oom_gang' is that the leaves will always be
treated as 'oom_gang'.