Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure

From: ndrw
Date: Sat Aug 10 2019 - 08:34:15 EST


On 09/08/2019 11:50, Michal Hocko wrote:
We try to protect low amount of cache. Have a look at get_scan_count
function. But the exact amount of the cache to be protected is really
hard to know wihtout a crystal ball or understanding of the workload.
The kernel doesn't have neither of the two.

Thank you. I'm familiarizing myself with the code. Is there anyone I could discuss some details with? I don't want to create too much noise here.

For example, are file pages created by mmaping files and are anon page exclusively allocated on heap (RW data)? If so, where do "streaming IO" pages belong to?

We have been thinking about this problem for a long time and couldn't
come up with anything much better than we have now. PSI is the most recent
improvement in that area. If you have better ideas then patches are
always welcome.

In general, I found there are very few user accessible knobs for adjusting caching, especially in the pre-OOM phase. On the other hand, swapping, dirty page caching, have many options or can even be disabled completely.

For example, I would like to try disabling/limiting eviction of some/all file pages (for example exec pages) akin to disabling swapping, but there is no such mechanism. Yes, there would likely be problems with large RO mmapped files that would need to be addressed, but in many applications users would be interested in having such options.

Adjusting how aggressive/conservative the system should be with the OOM killer also falls into this category.

[OOM killer accuracy]
That is a completely orthogonal problem, I am afraid. So far we have
been discussing _when_ to trigger OOM killer. This is _who_ to kill. I
haven't heard any recent examples that the victim selection would be way
off and killing something obviously incorrect.

You are right. I've assumed earlyoom is more accurate because of OOM killer performing better on a system that isn't stalled yet (perhaps it does). But actually, earlyoom doesn't trigger OOM killer at all:

https://github.com/rfjakob/earlyoom#why-not-trigger-the-kernel-oom-killer

Apparently some applications (chrome and electron-based tools) set their oom_score_adj incorrectly - this matches my observations of OOM killer behavior:

https://bugs.chromium.org/p/chromium/issues/detail?id=333617

Something that other people can play with to reproduce the issue would
be more than welcome.

This is the script I used. It reliably reproduces the issue: https://github.com/ndrw6/import_postcodes/blob/master/import_postcodes.py but it has quite a few dependencies, needs some input data and, in general, does a lot more than just fill up the memory. I will try to come up with something simpler.

Best regards,

ndrw