Re: [PATCH] inotify, memcg: account inotify instances to kmemcg

From: Amir Goldstein
Date: Sun Dec 20 2020 - 13:07:13 EST


On Sun, Dec 20, 2020 at 7:56 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
>
> On Sun, Dec 20, 2020 at 3:31 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > On Sun, Dec 20, 2020 at 6:24 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
> > >
> > > On Sat, Dec 19, 2020 at 8:25 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > >
> > > > On Sat, Dec 19, 2020 at 4:31 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Sat, Dec 19, 2020 at 1:48 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Sat, Dec 19, 2020 at 12:11 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > Currently the fs sysctl inotify/max_user_instances is used to limit the
> > > > > > > number of inotify instances on the system. For systems running multiple
> > > > > > > workloads, the per-user namespace sysctl max_inotify_instances can be
> > > > > > > used to further partition inotify instances. However there is no easy
> > > > > > > way to set a sensible system level max limit on inotify instances and
> > > > > > > further partition it between the workloads. It is much easier to charge
> > > > > > > the underlying resource (i.e. memory) behind the inotify instances to
> > > > > > > the memcg of the workload and let their memory limits limit the number
> > > > > > > of inotify instances they can create.
> > > > > >
> > > > > > Not that I have a problem with this patch, but what problem does it try to
> > > > > > solve?
> > > > >
> > > > > I am aiming for the simplicity to not set another limit which can
> > > > > indirectly be limited by memcg limits. I just want to set the memcg
> > > > > limit on our production environment which runs multiple workloads on a
> > > > > system and not think about setting a sensible value to
> > > > > max_user_instances in production. I would prefer to set
> > > > > max_user_instances to max int and let the memcg limits of the
> > > > > workloads limit their inotify usage.
> > > > >
> > > >
> > > > understood.
> > > > and I guess the multiple workloads cannot run each in their own userns?
> > > > because then you wouldn't need to change max_user_instances limit.
> > > >
> > >
> > > No workloads can run in their own user namespace but please note that
> > > max_user_instances is shared between all the user namespaces.
> >
> > /proc/sys/fs/inotify/max_user_instances is shared between all the user
> > namespaces, but it only controls the init_user_ns limits.
> > /proc/sys/user/max_inotify_instances is per user ns and it is the one that
> > actually controls the inotify limits in non init_user_ns.
> >
> > That said, I see that it is always initialized to MAX_INT on non init user ns,
> > which is exactly the setup that you are aiming at:
> >
> > $ unshare -U
> > $ cat /proc/sys/user/max_inotify_instances
> > 2147483647
> > $ cat /proc/sys/fs/inotify/max_user_instances
> > 128
>
> From what I understand, namespace-based limits are enforced
> hierarchically. More specifically in the example above, the
> application running in a user namespace with
> /proc/sys/user/max_inotify_instances = 2147483647 and
> /proc/sys/fs/inotify/max_user_instances = 128 will not be able to
> create more than 128 inotify instances. I actually tested this with a
> simple program which calls inotify_init() in a loop and it starts
> failing before the 128th iteration.

Right, it is.
Thanks for the clarification.

Thanks,
Amir.