Re: [PATCH] inotify: Convert to using per-namespace limits

From: Nikolay Borisov
Date: Mon Oct 10 2016 - 17:54:12 EST


On Mon, Oct 10, 2016 at 11:49 PM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
> Jan Kara <jack@xxxxxxx> writes:
>
>> On Mon 10-10-16 09:44:19, Nikolay Borisov wrote:
>>> On 10/07/2016 09:14 PM, Eric W. Biederman wrote:
>>> > Nikolay Borisov <kernel@xxxxxxxx> writes:
>>> >
>>> >> This patchset converts inotify to using the newly introduced
>>> >> per-userns sysctl infrastructure.
>>> >>
>>> >> Currently the inotify instances/watches are being accounted in the
>>> >> user_struct structure. This means that in setups where multiple
>>> >> users in unprivileged containers map to the same underlying
>>> >> real user (i.e. pointing to the same user_struct) the inotify limits
>>> >> are going to be shared as well, allowing one user(or application) to exhaust
>>> >> all others limits.
>>> >>
>>> >> Fix this by switching the inotify sysctls to using the
>>> >> per-namespace/per-user limits. This will allow the server admin to
>>> >> set sensible global limits, which can further be tuned inside every
>>> >> individual user namespace.
>>> >>
>>> >> Signed-off-by: Nikolay Borisov <kernel@xxxxxxxx>
>>> >> ---
>>> >> Hello Eric,
>>> >>
>>> >> I saw you've finally sent your pull request for 4.9 and it
>>> >> includes your implementatino of the ucount infrastructure. So
>>> >> here is my respin of the inotify patches using that.
>>> >
>>> > Thanks. I will take a good hard look at this after -rc1 when things are
>>> > stable enough that I can start a new development branch.
>>> >
>>> > I am a little concerned that the old sysctls have gone away. If no one
>>> > cares it is fine, but if someone depends on them existing that may count
>>> > as an unnecessary userspace regression. But otherwise skimming through
>>> > this code it looks good.
>>>
>>> So this indeed this is real issue and I meant to write something about
>>> it. Anyway, in order to preserve those sysctl what can be done is to
>>> hook them up with a custom sysctl handler taking the ns from the proc
>>> mount and the euid of current? I think this is a good approach, but
>>> let's wait and see if anyone will have objections to completely
>>> eliminating those sysctls.
>>
>> Well, I believe just discarding those sysctls is not an option - I'm pretty
>> sure there are scripts out there which tune these sysctls and those would
>> stop working. IMO not acceptable regression.
>
> Nikolay there is your objection.
>
> So since it should be straight forward let's preserve the existing
> sysctls. Then this change doesn't need to prove there are no scripts
> that tweak those sysctls.
>
> We are just talking changing the values in the initial user namespace so
> it should be completely compatible and straight forward to implement
> unless I am missing something.

Well I'm not so sure about this. Let's say those sysctls are going to
modify the ucount values in the init_user_ns. That's fine, however for
which particular user should they do this ? Should it be hardcoded for
kuid 0? or current_euid? I personally think they should be changing
the values for the current_euid.

>
> Eric