Michal Hocko <mhocko@xxxxxxxxxx> writes:
[SNIP]Maybe some more context would help the discussion?
But files are not killable, they can be shared... In other words this
doesn't help the oom killer to make an educated guess at all.
The struct file in patch 3 is the DRM fd. That's effectively "my
process's interface to talking to the GPU" not "a single GPU resource".
Once that file is closed, all of the process's private, idle GPU buffers
will be immediately freed (this will be most of their allocations), and
some will be freed once the GPU completes some work (this will be most
of the rest of their allocations).
Some GEM BOs won't be freed just by closing the fd, if they've been
shared between processes. Those are usually about 8-24MB total in a
process, rather than the GBs that modern apps use (or that our testcases
like to allocate and thus trigger oomkilling of the test harness instead
of the offending testcase...)
Even if we just had the private+idle buffers being accounted in OOM
badness, that would be a huge step forward in system reliability.
For graphics, we can't free most of our memory without also effectively: So question at every one: What do you think about this approach?I thing is just just wrong semantically. Non-reclaimable memory is a
pain, especially when there is way too much of it. If you can free that
memory somehow then you can hook into slab shrinker API and react on the
memory pressure. If you can account such a memory to a particular
process and make sure that the consumption is bound by the process life
time then we can think of an accounting that oom_badness can consider
when selecting a victim.
killing the process. i915 and vc4 have "purgeable" interfaces for
userspace (on i915 this is exposed all the way to GL applications and is
hooked into shrinker, and on vc4 this is so far just used for
userspace-internal buffer caches to be purged when a CMA allocation
fails). However, those purgeable pools are expected to be a tiny
fraction of the GPU allocations by the process.