Re: Some questions about linux kernel.

From: Jesse Pollard (pollard@tomcat.admin.navo.hpc.mil)
Date: Tue Mar 21 2000 - 12:24:20 EST


David Whysong <dwhysong@physics.ucsb.edu>:
> On Mon, 20 Mar 2000, Jesse Pollard wrote:
> >On Mon, 20 Mar 2000, David Whysong wrote:
> >>
> >>When OOM the kernel has to kill something. The trick is for the kernel to
> >>kill tasks that are considered "less important". Don't kill init, X
> >>server, syslogd, whatever. Some of this policy could be set by a userspace
> >>daemon.
> >
> >By the time the userspace daemon runs it is too late.
>
> Not necessarrily. Here is the (rough) idea:
>
> 1. Modify Rik van Riel's patch to understand multiple "nice" levels.
> 2. Have a daemon that finds new processes and renices them,
> based on a config file. This puts policy in user-space.
>
> In theory, the daemon could be woken up by the kernel when new processes
> are started, and renice them before they actually run.

And what happens with SMP - the daemon may run on one CPU, the fork was
done on the other.

> The beauty of this is that the daemon is not required for OOM handling;
> Rik van Riel's kernel patch takes care of that. The daemon simply allows
> policy to be placed in user-space.

So the patch must pick the lowest priority proces to kill?

> >The kernel must do the bookkeeping for the resource allocation. It must
> >be able to decrement from the users quota what is being allocated by the
> >various allocation methods - fork, exec, sbrk, stack page fault, COW.
>
> Ugh, no thanks. Overhead in common functions...

The overhead of decrementing a counter.
The overhead of a test for greater than 0 (or >= 0 depending on how the
counter is defined).

> >The kernel can enforce the limit, but it cannot determine what the limit
> >should be. This is the place for the userspace daemon.
>
> Ideally you want policy set in userspace, but executed by the kernel (with
> a reasonable default case). That's what I'm trying to do, though not with
> limits.

Then there won't be any enfocement.

> >On some systems, the quotas are staticly set by administration, and loaded
> >into the kernel on login (or cron/batch). If the quotas to be set are
> >not available (determined by the userspace daemon) then the user should
> >not be logged in (or cron/batch job not started).
>
> (I may be misunderstanding you here...)

User level resources are allocated at login (at least for
memory/cpu/processes).

> Refusing user login is yet another failure mode that will cause great
> moaning and mayhem, and result in an out of work sysadmin. No thanks.

Much less moaning that having the system crash when the user does login.
Been there, done that.

> >It may be possible for the userspace daemon to be able to carry out
> >adjustments - If the policy is that from 8-5 all logins/cron/batch may
> >have the quota set by administration until N logins are active. No more
> [...]
>
> I'm still thinking from a quota-free point of view.
>
> >>This is completely independent of overcommit or rlimits (which can cause
> >>the OOM behavior to happen more often, though in more predictable and
> >>perhaps less drastic fashion).
> >
> >It is a way to prevent OOM from occuring at the system level.
>
> Maybe, but I'm not convinced. You still have to deal with kernel
> allocations. And these memory quotas come at huge expense, if they are to
> be useful at preventing system-wide OOM.

What huge expence? We are talking about decrementing a quota counter.
Disk space is cheap. Quotas can even be used in an overcommitted form.

> >>I'm a scientific app programmer, not a system programmer... but I'll try
> >>to code up a simple daemon in the next couple days that works in
> >>conjunction with a modified version of Rik van Riel's patch.
> >
> >Watch out for the kernel level accounting that is needed.
>
> What is needed is sane handling of system wide OOM situations. Killing the
> offending process when the system runs OOM is a lot more sane than
> crippling everybody's access to memory resources. IMNSHO.

And the problem still causes system failures. The system is still blamed
for being buggy. There is no need for either. Using a daemon to try to
prevent OOM is very klugy, and won't work reliably.

And the problem still remains of properly identifying who, when and where
the process is that causes the OOM. The system doesn't have the resources
at the time it fails. Being able to say "kill x" is a way to get something
done, but if x is the important process - you fail. If you make exceptions -
never kill x, it is too important - then when x causes the failure, you are
down. The system may be down. Quotas are just a tool to help prevent it.
They have been proven to solve the problem in the past. They are still
used on UNIX systems in the present.

For very small organization (5-10 users) you can get away without quotas.

For larger organizations (11-50) you can't. It's expensive on a per
failure situation.

For large organizations (25-100), you better have a way to prevent OOM.
You don't have to use it if management directs to use overcommitting.

I need it for what I'm doing now, I believe others need it and don't
realize the amount stability that can be accomplished with it.
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:33 EST