Re: [PATCH] inotify: Convert to using per-namespace limits

From: Jan Kara
Date: Mon Oct 10 2016 - 12:40:53 EST


On Mon 10-10-16 09:44:19, Nikolay Borisov wrote:
> On 10/07/2016 09:14 PM, Eric W. Biederman wrote:
> > Nikolay Borisov <kernel@xxxxxxxx> writes:
> >
> >> This patchset converts inotify to using the newly introduced
> >> per-userns sysctl infrastructure.
> >>
> >> Currently the inotify instances/watches are being accounted in the
> >> user_struct structure. This means that in setups where multiple
> >> users in unprivileged containers map to the same underlying
> >> real user (i.e. pointing to the same user_struct) the inotify limits
> >> are going to be shared as well, allowing one user(or application) to exhaust
> >> all others limits.
> >>
> >> Fix this by switching the inotify sysctls to using the
> >> per-namespace/per-user limits. This will allow the server admin to
> >> set sensible global limits, which can further be tuned inside every
> >> individual user namespace.
> >>
> >> Signed-off-by: Nikolay Borisov <kernel@xxxxxxxx>
> >> ---
> >> Hello Eric,
> >>
> >> I saw you've finally sent your pull request for 4.9 and it
> >> includes your implementatino of the ucount infrastructure. So
> >> here is my respin of the inotify patches using that.
> >
> > Thanks. I will take a good hard look at this after -rc1 when things are
> > stable enough that I can start a new development branch.
> >
> > I am a little concerned that the old sysctls have gone away. If no one
> > cares it is fine, but if someone depends on them existing that may count
> > as an unnecessary userspace regression. But otherwise skimming through
> > this code it looks good.
>
> So this indeed this is real issue and I meant to write something about
> it. Anyway, in order to preserve those sysctl what can be done is to
> hook them up with a custom sysctl handler taking the ns from the proc
> mount and the euid of current? I think this is a good approach, but
> let's wait and see if anyone will have objections to completely
> eliminating those sysctls.

Well, I believe just discarding those sysctls is not an option - I'm pretty
sure there are scripts out there which tune these sysctls and those would
stop working. IMO not acceptable regression.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR