RE: [PATCH 3.2.0-rc1 0/3] Used Memory Meter pseudo-device and relatedchanges in MM

From: David Rientjes
Date: Thu Jan 05 2012 - 18:10:16 EST


On Thu, 5 Jan 2012, leonid.moiseichuk@xxxxxxxxx wrote:

> I tried to sort out all inputs coming. But before doing the next step I
> prefer to have tests passed. Changes you proposed are strain forward and
> understandable.
> Hooking in mm/vmscan.c and mm/page-writeback.c is not so easy, I need
> to find proper place and make adequate proposal.
> Using memcg is doesn't not look for me now as a good way because I
> wouldn't like to change memory accounting - memcg has strong reason to
> keep caches.
>

If you can accept the overhead of the memory controller (increase in
kernel text size and amount of metadata for page_cgroup), then you can
already do this with a combination of memory thresholds with
cgroup.event_control and disabling of the oom killer entirely with
memory.oom_control. You can also get notified when the oom killer is
triggered by using eventfd(2) on memory.oom_control even though it's
disabled in the kernel. Then, the userspace task attached to that control
file can send signals to applications to free their memory or, in the
worst case, choose to kill an application but have all that policy be
implemented in userspace.

We actually have extended that internally to have an oom killer delay,
i.e. a specific amount of time must pass for userspace to react to the oom
situation or the oom killer will actually be triggered. This is needed in
case our userspace is blocked or can't respond for whatever reason and is
a nice fallback so that we're guaranteed to never end up livelocked. That
delay gets reset anytime a page is uncharged to a memcg, the memcg limit
is increased, or the delay is rewritten (for userspace to say "I've
handled the event"). Those patches were posted on linux-mm several months
ago but never merged upstream. You should be able to use the same concept
apart from the memory controller and implement it generically.

You also presented this as an alternative for "embedded or small" users so
I wasn't aware that using the memory controller was an acceptable solution
given its overhead.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/