Re: [PATCH v2] fs: fsnotify: account fsnotify metadata to kmemcg

From: Jan Kara
Date: Wed Nov 01 2017 - 11:15:41 EST


On Wed 01-11-17 00:44:18, Yang Shi wrote:
> On 10/31/17 3:12 AM, Jan Kara wrote:
> >On Tue 31-10-17 00:39:58, Yang Shi wrote:
> >>On 10/30/17 5:43 AM, Jan Kara wrote:
> >>>On Sat 28-10-17 02:22:18, Yang Shi wrote:
> >>>>If some process generates events into a huge or unlimit event queue, but no
> >>>>listener read them, they may consume significant amount of memory silently
> >>>>until oom happens or some memory pressure issue is raised.
> >>>>It'd better to account those slab caches in memcg so that we can get heads
> >>>>up before the problematic process consume too much memory silently.
> >>>>
> >>>>But, the accounting might be heuristic if the producer is in the different
> >>>>memcg from listener if the listener doesn't read the events. Due to the
> >>>>current design of kmemcg, who does the allocation, who gets the accounting.
> >>>>
> >>>>Signed-off-by: Yang Shi <yang.s@xxxxxxxxxxxxxxx>
> >>>>---
> >>>>v1 --> v2:
> >>>>* Updated commit log per Amir's suggestion
> >>>
> >>>I'm sorry but I don't think this solution is acceptable. I understand that
> >>>in some cases (and you likely run one of these) the result may *happen* to
> >>>be the desired one but in other cases, you might be charging wrong memcg
> >>>and so misbehaving process in memcg A can effectively cause a DoS attack on
> >>>a process in memcg B.
> >>
> >>Yes, as what I discussed with Amir in earlier review, current memcg design
> >>just accounts memory to the allocation process, but has no idea who is
> >>consumer process.
> >>
> >>Although it is not desirable to DoS a memcg, it still sounds better than DoS
> >>the whole machine due to potential oom. This patch is aimed to avoid such
> >>case.
> >
> >Thinking about this even more, your solution may have even worse impact -
> >due to allocations failing, some applications may avoid generation of fs
> >notification events for actions they do. And that maybe a security issue in
> >case there are other applications using fanotify for security enforcement,
> >virus scanning, or whatever... In such cases it is better to take the
> >whole machine down than to let it run.
>
> I guess (just guess) this might be able to be solved by Amir's patch, right?
> An overflow or error event will be queued, then the consumer applications
> could do nicer error handling/softer exit.

Well, Amir's patch solves the problem of visibility that something bad
(lost event) happened. But it does not address the fundamental issue that
you account the event to a wrong memcg and thus fail the allocation at
wrong times.

> Actually, the event is dropped when -ENOMEM regardless of my patch. As Amir
> said this patch may just amplify this problem if my understanding is right.

So currently, -ENOMEM cannot normally happen for such small allocation. The
kernel will rather go OOM and kill some process to free memory. So putting
memcgs into the picture changes the behavior.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR