Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure
From: ndrw
Date: Sat Aug 10 2019 - 08:34:15 EST
On 09/08/2019 11:50, Michal Hocko wrote:
We try to protect low amount of cache. Have a look at get_scan_count
function. But the exact amount of the cache to be protected is really
hard to know wihtout a crystal ball or understanding of the workload.
The kernel doesn't have neither of the two.
Thank you. I'm familiarizing myself with the code. Is there anyone I
could discuss some details with? I don't want to create too much noise here.
For example, are file pages created by mmaping files and are anon page
exclusively allocated on heap (RW data)? If so, where do "streaming IO"
pages belong to?
We have been thinking about this problem for a long time and couldn't
come up with anything much better than we have now. PSI is the most recent
improvement in that area. If you have better ideas then patches are
always welcome.
In general, I found there are very few user accessible knobs for
adjusting caching, especially in the pre-OOM phase. On the other hand,
swapping, dirty page caching, have many options or can even be disabled
completely.
For example, I would like to try disabling/limiting eviction of some/all
file pages (for example exec pages) akin to disabling swapping, but
there is no such mechanism. Yes, there would likely be problems with
large RO mmapped files that would need to be addressed, but in many
applications users would be interested in having such options.
Adjusting how aggressive/conservative the system should be with the OOM
killer also falls into this category.
[OOM killer accuracy]
That is a completely orthogonal problem, I am afraid. So far we have
been discussing _when_ to trigger OOM killer. This is _who_ to kill. I
haven't heard any recent examples that the victim selection would be way
off and killing something obviously incorrect.
You are right. I've assumed earlyoom is more accurate because of OOM
killer performing better on a system that isn't stalled yet (perhaps it
does). But actually, earlyoom doesn't trigger OOM killer at all:
https://github.com/rfjakob/earlyoom#why-not-trigger-the-kernel-oom-killer
Apparently some applications (chrome and electron-based tools) set their
oom_score_adj incorrectly - this matches my observations of OOM killer
behavior:
https://bugs.chromium.org/p/chromium/issues/detail?id=333617
Something that other people can play with to reproduce the issue would
be more than welcome.
This is the script I used. It reliably reproduces the issue:
https://github.com/ndrw6/import_postcodes/blob/master/import_postcodes.py
but it has quite a few dependencies, needs some input data and, in
general, does a lot more than just fill up the memory. I will try to
come up with something simpler.
Best regards,
ndrw