[RFC] Per file OOM-badness / RSS once more

From: Christian König
Date: Fri Jun 24 2022 - 04:04:56 EST


Hello everyone,

To summarize the issue I'm trying to address here: Processes can allocate
resources through a file descriptor without being held responsible for it.

I'm not explaining all the details again. See here for a more deeply
description of the problem: https://lwn.net/ml/linux-kernel/20220531100007.174649-1-christian.koenig@xxxxxxx/

With this iteration I'm trying to address a bunch of the comments Michal Hocko
(thanks a lot for that) gave as well as giving some new ideas.

Changes made so far:
1. Renamed the callback into file_rss(). This is at least a start to better
describe what this is all about. I've been going back and forth over the
naming here, if you have any better idea please speak up.

2. Cleanups, e.g. now providing a helper function in the fs layer to sum up
all the pages allocated by the files in a file descriptor table.

3. Using the actual number of allocated pages for the shmem implementation
instead of just the size. I also tried to ignore shmem files which are part
of tmpfs, cause that has a separate accounting/limitation approach.

4. The OOM killer now prints the memory of the killed process including the per
file pages which makes the whole things much more comprehensible.

5. I've added the per file pages to the different reports in RSS in procfs.
This has the interesting effect that tools like top suddenly give a much
more accurate overview of the memory use as well. This of course increases
the overhead of gathering those information quite a bit and I'm not sure how
feasible that is for up-streaming. On the other hand this once more clearly
shows that we need to do something about this issue.

Another rather interesting observation is that multiple subsystems (shmem,
tmpfs, ttm) came up with the same workaround of limiting the memory which can
be allocated through them to 50% of the whole system memory. Unfortunately
that isn't the same 50% and it doesn't apply everywhere, so you can still
easily crash the box.

Ideas and/or comments are really welcome.

Thanks,
Christian.