Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure
From: Michal Hocko
Date: Mon Aug 12 2019 - 04:24:17 EST
On Sat 10-08-19 13:34:06, ndrw wrote:
> On 09/08/2019 11:50, Michal Hocko wrote:
> > We try to protect low amount of cache. Have a look at get_scan_count
> > function. But the exact amount of the cache to be protected is really
> > hard to know wihtout a crystal ball or understanding of the workload.
> > The kernel doesn't have neither of the two.
>
> Thank you. I'm familiarizing myself with the code. Is there anyone I could
> discuss some details with? I don't want to create too much noise here.
linux-mm mailing list sounds like a good fit.
> For example, are file pages created by mmaping files and are anon page
> exclusively allocated on heap (RW data)? If so, where do "streaming IO"
> pages belong to?
Page cache will be generated by both buffered IO (read/write) and file
mmaps. Anonymous memory by MAP_PRIVATE of file backed or MAP_ANON.
Streaming IO is generally referred to by an single data pass IO that
is not reused later (e.g. a backup).
> > We have been thinking about this problem for a long time and couldn't
> > come up with anything much better than we have now. PSI is the most recent
> > improvement in that area. If you have better ideas then patches are
> > always welcome.
>
> In general, I found there are very few user accessible knobs for adjusting
> caching, especially in the pre-OOM phase. On the other hand, swapping, dirty
> page caching, have many options or can even be disabled completely.
>
> For example, I would like to try disabling/limiting eviction of some/all
> file pages (for example exec pages) akin to disabling swapping, but there is
> no such mechanism. Yes, there would likely be problems with large RO mmapped
> files that would need to be addressed, but in many applications users would
> be interested in having such options.
>
> Adjusting how aggressive/conservative the system should be with the OOM
> killer also falls into this category.
What would that mean and how it would be configured?
--
Michal Hocko
SUSE Labs