Re: Misleading OOM messages

From: Christoph Lameter
Date: Fri May 15 2009 - 14:30:00 EST


On Thu, 14 May 2009, Pavel Machek wrote:

> > "No available memory" still suggests that plugging in more memory is the
> > right solution.
>
> And... on correctly working kernel, it is, right?

Nope. Usually something else is amiss if OOM occurs.

> If you have no swap space and too many applications, you plug more
> memory. (Or invent some swap).

Thats not a usual configuration. OOM there also depends on various OS
knobs. The failure occurred because application did anonymous allocations
and you did not give the OS a way to effectively push these pages out to
disk. Thus it was not able to reclaim memory.

> If you misconfigured cgroups, you give more memory to them.

If you do not have enough memory in a cgroup then your application should
slow down (because of page evictions) but the system should not OOM.
Are cgroups broken or why are you getting OOMs when using them?

> If your applications mlocked 900MB and you have 1GB, you need to plug
> more memory.

IMHO the mlocking is the issue. There are safeguards (ulimit) to prevent
this. Again a typical misconfiguration that requires disabling safeguards.
If you increase memory then more memory is likely going to be mlocked by
whoever went crazy with mlocking in the first place.

> So... when is plugging more memory _not_ valid answer? AFAICT it is
> when it is some kernel problem, resulting in memory not being
> reclaimed fast enough....

Reclaim failures occur typically because memory is not reclaimable due to
mlocking, memory allocation in a context where we cannot perform
effective reclaim (no disk access, atomic context) (device
drivers are prone to that), or when asking for higher order pages and the
defrag logic cannot satisfy your request.

Then there is the issue on 32 bit platforms where certain kernel
allocations must occur in the memory zone under 1G. If you add more memory
then less memory is available e under !G because the kernel needs to
allocate more metadata to manage more memory. Thus you OOM faster.

So I thin that an OOM is about misconfigurations or a kernel bug. If the
application needs more memory then the pageing mechanism of the OS should
create more virtual memory for the process.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/