Re: [PATCH v2] fs: fsnotify: account fsnotify metadata to kmemcg

From: Jan Kara
Date: Tue Feb 20 2018 - 07:44:04 EST


On Mon 19-02-18 21:07:28, Amir Goldstein wrote:
> On Mon, Feb 19, 2018 at 3:50 PM, Jan Kara <jack@xxxxxxx> wrote:
> [...]
> > For fanotify without FAN_UNLIMITED_QUEUE the situation is similar as for
> > inotify - IMO low practical impact, apps should generally handle queue
> > overflow so I don't see a need for any opt in (more accurate memcg charging
> > takes precedense over possibly broken apps).
> >
> > For fanotify with FAN_UNLIMITED_QUEUE the situation is somewhat different -
> > firstly there is a practical impact (memory consumption is not limited by
> > anything else) and secondly there are higher chances of the application
> > breaking (no queue overflow expected) and also that this breakage won't be
> > completely harmless (e.g., the application participates in securing the
> > system). I've been thinking about this "conflict of interests" for some
> > time and currently I think that the best handling of this is that by
> > default events for FAN_UNLIMITED_QUEUE groups will get allocated with
> > GFP_NOFAIL - such groups can be created only by global CAP_SYS_ADMIN anyway
> > so it is reasonably safe against misuse (and since the allocations are
> > small it is in fact equivalent to current status quo, just more explicit).
> > That way application won't see unexpected queue overflow. The process
> > generating event may be looping in the allocator but that is the case
> > currently as well. Also the memcg with the consumer of events will have
> > higher chances of triggering oom-kill if events consume too much memory but
> > I don't see how this is not a good thing by default - and if such reaction
> > is not desirable, there's memcg's oom_control to tune the OOM behavior
> > which has capabilities far beyond of what we could invent for fanotify...
> >
> > What do you think Amir?
> >
>
> If I followed all your reasoning correctly, you propose to change behavior to
> always account events to group memcg and never fail event allocation,
> without any change of API and without opting-in for new behavior?
> I think it makes sense. I can't point at any expected breakage,
> so overall, this would be a good change.
>
> I just feel sorry about passing an opportunity to improve functionality.
> The fact that fanotify does not have a way for defining the events queue
> size is a deficiency IMO, one which I had to work around in the past.
> I find that assigning group to memgc and configure memcg to desired
> memory limit and getting Q_OVERFLOW on failure to allocate event
> is going to be a proper way of addressing this deficiency.

So if you don't pass FAN_Q_UNLIMITED, you will get queue with a fixed size
and will get Q_OVERFLOW if that is exceeded. So is your concern that you'd
like some other fixed limit? Larger one or smaller one and for what
reason?

> But if you don't think we should bind these 2 things together,
> I'll let Shakeel decide if he want to pursue the Q_OVERFLOW change
> or not.

So if there is still some uncovered use case for finer tuning of event
queue length than setting or not setting FAN_Q_UNLIMITED (+ possibly
putting the task to memcg to limit memory usage), we can talk about how to
address that but at this point I don't see a strong reason to bind this to
whether / how events are accounted to memcg...

And we still need to make sure we properly do ENOMEM -> Q_OVERFLOW
translation and use GFP_NOFAIL for FAN_Q_UNLIMITED groups before merging
Shakeel's memcg accounting patches. But Shakeel does not have to be the one
implementing that (although if you want to, you are welcome Shakeel :) -
otherwise I hope I'll get to it reasonably soon).

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR