Re: [patch 4/7 -mm] oom: badness heuristic rewrite

From: Minchan Kim
Date: Tue Feb 16 2010 - 08:15:15 EST


On Mon, 2010-02-15 at 13:54 -0800, David Rientjes wrote:
> We're not enforcing a global, system-wide forkbomb policy in the oom
> killer, but we do need to identify tasks that fork a very large number of
> tasks to break ties with other tasks: in other words, it would not be
> helpful to kill an application that has been running for weeks because
> another application with the same or less memory usage has forked 1000
> children and has caused an oom condition. That unfairly penalizes the
> former application that is actually doing work.
>
> Again, I'd encourage you to look at this as only a slight penalization
> rather than a policy that strictly needs to be enforced. If it were
> strictly enforced, it would be a prerequisite for selection if such a task
> were to exist; in my implementation, it is part of the heuristic.

Okay. I can think it of slight penalization in this patch.
But in current OOM logic, we try to kill child instead of forkbomb
itself. My concern was that.
Of course, It's not a part of your patch[2/7] which is good.
It has been in there during long time. I hope we could solve that in
this chance. Pz, look at below my example.

>
> > > That doesn't work with Rik's example of a webserver that forks a large
> > > number of threads to handle client connections. It is _always_ better to
> > > kill a child instead of making the entire webserver unresponsive.
> >
> > In such case, admin have to handle it by oom_forkbom_thres.
> > Isn't it your goal?
> >
>
> oom_forkbomb_thres has a default value, which is 1000, so it should be
> enabled by default.
>
> > My suggestion is how handle buggy forkbomb processes which make
> > system almost hang by user's mistake. :)
> >
>
> I don't think you've given a clear description (or, even better, a patch)
> of your suggestion.

I write down my suggestion, again.
My concern is following as.


1. Forkbomb A task makes 2000 children in a second.
2. 2000 children has almost same memory usage. I know another factors
affect oom_score. but in here, I assume all of children have almost same
badness score.
3. Your heuristic penalizes A task so it would be detected as forkbomb.
4. So OOM killer select A task as bad task.
5. oom_kill_process kills high badness one of children, _NOT_ task A
itself. Unfortunately high badness child doesn't has big memory usage
compared to sibling. It means sooner or later we would need OOM again.


My point was 5.

1. oom_kill_process have to take a long time to scan tasklist for
selecting just one high badness task. Okay. It's right since OOM system
hang is much bad and it would be better to kill just first task(ie,
random one) in tasklist.

2. But in above scenario, sibling have almost same memory. So we would
need OOM again sooner or later and OOM logic could do above scenario
repeatably.

Yes. Our system is already unresponsible since time slice is spread out
many child tasks. Then in here, it would be better to kill dumb child
instead of BUGGY forkbomb task A? How long time do we have to wait
system responsible?

I said _BUGGY_ forkbomb task. That's because Rik's example isn't buggy
task. Administrator already knows apache can make many task in a second.
So he can handle it by your oom_forkbomb_thres knob. It's goal of your
knob.

So my suggestion is following as.

I assume normal forkbomb tasks are handled well by admin who use your
oom_forkbom_thres. The remained problem is just BUGGY forkbomb process.
So if your logic selects same victim task as forkbomb by your heuristic
and it's 5th time continuously in 10 second, let's kill forkbomb instead
of child.

tsk = select_victim_task(&cause);
if (tsk == last_victim_tsk && cause == BUGGY_FORKBOMB)
if (++count == 5 && time_since_first_detect_forkbomb <= 10*HZ)
kill(tsk);
else {
last_victim_tsk = NULL; count = 0; time_since... = 0;
kill(tsk's child);
}

It's just example of my concern. It might never good solution.
What I mean is just whether we have to care this.



--
Kind regards,
Minchan Kim


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/