On 2018-01-19 09:39 AM, Christian KÃnig wrote:
Am 19.01.2018 um 09:20 schrieb Michal Hocko:FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland
On Thu 18-01-18 12:01:32, Eric Anholt wrote:I already tried that and the problem with that approach is that some
Michal Hocko <mhocko@xxxxxxxxxx> writes:OK, in that case I would propose a different approach. We already
On Thu 18-01-18 18:00:06, Michal Hocko wrote:Maybe some more context would help the discussion?
On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:OK, but how do you attribute that memory to a particular OOM killable
Hi, this series is a revised version of an RFC sent by ChristianPlease add the full description to the cover letter and do not make
KÃnig
a few years ago. The original RFC can be found at
https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
This is the same idea and I've just adressed his concern from the
original RFC
and switched to a callback into file_ops instead of a new member
in struct file.
people hunt links.
Here is the origin cover letter text
: I'm currently working on the issue that when device drivers
allocate memory on
: behalf of an application the OOM killer usually doesn't knew
about that unless
: the application also get this memory mapped into their address
space.
:
: This is especially annoying for graphics drivers where a lot of
the VRAM
: usually isn't CPU accessible and so doesn't make sense to map
into the
: address space of the process using it.
:
: The problem now is that when an application starts to use a lot
of VRAM those
: buffers objects sooner or later get swapped out to system memory,
but when we
: now run into an out of memory situation the OOM killer obviously
doesn't knew
: anything about that memory and so usually kills the wrong process.
entity? And how do you actually enforce that those resources get freed
on the oom killer action?
: The following set of patches tries to address this problem byBut files are not killable, they can be shared... In other words this
introducing a per
: file OOM badness score, which device drivers can use to give the
OOM killer a
: hint how many resources are bound to a file descriptor so that it
can make
: better decisions which process to kill.
doesn't help the oom killer to make an educated guess at all.
The struct file in patch 3 is the DRM fd. That's effectively "my
process's interface to talking to the GPU" not "a single GPU resource".
Once that file is closed, all of the process's private, idle GPU buffers
will be immediately freed (this will be most of their allocations), and
some will be freed once the GPU completes some work (this will be most
of the rest of their allocations).
Some GEM BOs won't be freed just by closing the fd, if they've been
shared between processes. Those are usually about 8-24MB total in a
process, rather than the GBs that modern apps use (or that our testcases
like to allocate and thus trigger oomkilling of the test harness instead
of the offending testcase...)
Even if we just had the private+idle buffers being accounted in OOM
badness, that would be a huge step forward in system reliability.
have rss_stat. So why do not we simply add a new counter there
MM_KERNELPAGES and consider those in oom_badness? The rule would be
that such a memory is bound to the process life time. I guess we will
find more users for this later.
buffers are not created by the application which actually uses them.
For example X/Wayland is creating and handing out render buffers to
application which want to use OpenGL.
So the result is when you always account the application who created the
buffer the OOM killer will certainly reap X/Wayland first. And that is
exactly what we want to avoid here.
anymore. With DRI3 and Wayland, buffers are allocated by the clients and
then shared with the X / Wayland server.
Also, in all cases, the amount of memory allocated for buffers shared
between DRI/Wayland clients and the server should be relatively small
compared to the amount of memory allocated for buffers used only locally
in the client, particularly for clients which create significant memory
pressure.