On Thu, 13 Jun 2002, Jesse Pollard wrote:
> > It is my experience that a single process using large amounts of memory
> > causes the system to start swapping. Once it starts swapping, every
> > process that does anything (apart from indefinite wait) goes into "I'm
> > ready to do some processing, but my memory is swapped out" state, and the
> > whole system collapses.
>
> Not necessarily. The condition can also be caused by having a large, well
> behaved process working its' little heart out properly, and a small process
> that grows suddenly (or even slowly - it doesn't take much to push it over
> the limit). As the small process grows, the larger one is paged out. Once
> the swap space is filled just adding one more page could do it. And it doesn't
> matter what process allocates that page. Key: disable oversubscription of
> memory.
Can we at least agree that the current kernel behaviour is a positive
feedback loop - something is bad, therefore it's about to get worse. Some
of the suggestions I had would move this more towards a negative feedback
loop.
> It can't decide what causes the problem. There are too may possibilities.
I think the majority of times a system will be set up with enough swap
space to handle its normal operation. Otherwise, just give it some more
swap. However, one circumstance that throwing lots of swap around doesn't
fix is when a process has an insatiable need for memory. In this case,
either the process grows very quickly, or is just plain big. I think the
out-of-memory killer should target big or growing processes. If it doesn't
hit the correct process the first time, it will free up a lot more RAM
than it would otherwise, and it would be likely to get it right the second
time.
> > My suggestion would be to set a maximum core size for the xfs-daemon
> > process...
>
> Also put a maximum limit on the X server.
Although this wasn't the problem in this case (and therefore wouldn't have
a massive effect), it's a sensible precaution.
The xfs server is the important server here, because DOSing it DOSes a
whole network of workstations.
> The easiest fix is to disable oversubscription of memory, though that may
> cause some daemons to abort if they don't check for allocation failures
> (which I do consider a bug).
That does indeed sound a good idea. I guess one would then give the system
one big dollop of swap, to allow it to actually cater for processes that
allocate large amounts without using it.
Matthew
-- Bashir: And who told you that? O'Brien: You did. In the future. Bashir: Oh. Well, who am I to argue with me?- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sat Jun 15 2002 - 22:00:29 EST