Re: [PATCH] inotify: Convert to using per-namespace limits

From: Eric W. Biederman
Date: Mon Oct 10 2016 - 16:51:26 EST


Jan Kara <jack@xxxxxxx> writes:

> On Mon 10-10-16 09:44:19, Nikolay Borisov wrote:
>> On 10/07/2016 09:14 PM, Eric W. Biederman wrote:
>> > Nikolay Borisov <kernel@xxxxxxxx> writes:
>> >
>> >> This patchset converts inotify to using the newly introduced
>> >> per-userns sysctl infrastructure.
>> >>
>> >> Currently the inotify instances/watches are being accounted in the
>> >> user_struct structure. This means that in setups where multiple
>> >> users in unprivileged containers map to the same underlying
>> >> real user (i.e. pointing to the same user_struct) the inotify limits
>> >> are going to be shared as well, allowing one user(or application) to exhaust
>> >> all others limits.
>> >>
>> >> Fix this by switching the inotify sysctls to using the
>> >> per-namespace/per-user limits. This will allow the server admin to
>> >> set sensible global limits, which can further be tuned inside every
>> >> individual user namespace.
>> >>
>> >> Signed-off-by: Nikolay Borisov <kernel@xxxxxxxx>
>> >> ---
>> >> Hello Eric,
>> >>
>> >> I saw you've finally sent your pull request for 4.9 and it
>> >> includes your implementatino of the ucount infrastructure. So
>> >> here is my respin of the inotify patches using that.
>> >
>> > Thanks. I will take a good hard look at this after -rc1 when things are
>> > stable enough that I can start a new development branch.
>> >
>> > I am a little concerned that the old sysctls have gone away. If no one
>> > cares it is fine, but if someone depends on them existing that may count
>> > as an unnecessary userspace regression. But otherwise skimming through
>> > this code it looks good.
>>
>> So this indeed this is real issue and I meant to write something about
>> it. Anyway, in order to preserve those sysctl what can be done is to
>> hook them up with a custom sysctl handler taking the ns from the proc
>> mount and the euid of current? I think this is a good approach, but
>> let's wait and see if anyone will have objections to completely
>> eliminating those sysctls.
>
> Well, I believe just discarding those sysctls is not an option - I'm pretty
> sure there are scripts out there which tune these sysctls and those would
> stop working. IMO not acceptable regression.

Nikolay there is your objection.

So since it should be straight forward let's preserve the existing
sysctls. Then this change doesn't need to prove there are no scripts
that tweak those sysctls.

We are just talking changing the values in the initial user namespace so
it should be completely compatible and straight forward to implement
unless I am missing something.

Eric