Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process

From: J. Bruce Fields
Date: Tue Aug 02 2016 - 16:44:10 EST


On Tue, Aug 02, 2016 at 04:01:22PM -0400, Jeff Layton wrote:
> On Tue, 2016-08-02 at 15:44 -0400, J. Bruce Fields wrote:
> > On Tue, Aug 02, 2016 at 02:09:22PM -0500, Eric W. Biederman wrote:
> > >
> > > > > "J. Bruce Fields" <bfields@xxxxxxxxxxxx> writes:
> > >
> > > >
> > > > On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote:
> > > > >
> > > > > > > > > Nikolay Borisov <kernel@xxxxxxxx> writes:
> > > > >
> > > > > >
> > > > > > Currently when /proc/locks is read it will show all the file locks
> > > > > > which are currently created on the machine. On containers, hosted
> > > > > > on busy servers this means that doing lsof can be very slow. I
> > > > > > observed up to 5 seconds stalls reading 50k locks, while the container
> > > > > > itself had only a small number of relevant entries. Fix it by
> > > > > > filtering the locks listed by the pidns of the current process
> > > > > > and the process which created the lock.
> > > > >
> > > > > The locks always confuse me so I am not 100% connecting locks
> > > > > to a pid namespace is appropriate.
> > > > >
> > > > > That said if you are going to filter by pid namespace please use the pid
> > > > > namespace of proc, not the pid namespace of the process reading the
> > > > > file.
> > > >
> > > > Oh, that makes sense, thanks.
> > > >
> > > > What does /proc/mounts use, out of curiosity?ÂÂThe mount namespace that
> > > > /proc was originally mounted in?
> > >
> > > /proc/mounts -> /proc/self/mounts
> >
> > D'oh, I knew that.
> >
> > >
> > > /proc/[pid]/mounts lists mounts from the mount namespace of the
> > > appropriate process.
> > >
> > > That is another way to go but it is a tread carefully thing as changing
> > > things that way it is easy to surprise apparmor or selinux rules and be
> > > surprised you broke someones userspace in a way that prevents booting.
> > > Although I suspect /proc/locks isn't too bad.
> >
> > OK, thanks.
> >
> > /proc/[pid]/locks might be confusing.ÂÂI'd expect it to be "all the
> > locks owned by this task", rather than "all the locks owned by pid's in
> > the same pid namespace", or whatever criterion we choose.
> >
> > Uh, I'm still trying to think of the Obviously Right solution here, and
> > it's not coming.
> >
> > --b.
>
>
> I'm a little leery of changing how this works. It has always been
> maintained as a legacy interface, so do we run the risk of breaking
> something if we turn it into a per-namespace thing?

The namespace work is all about making interfaces per-namespace. I
guess it works as long as it contributes to the illusion that each
container is its own machine.

Thinking about it, I might be sold on the per-pidns approach (with
Eric's modification to use the pidns of /proc not the reader).

My complaint about not being able to see conflicting locks would apply
just as well to conflicts from nfs locks held by other clients. A disk
filesystem shared across multiple containers is a little like an nfs
filesystem shared between nfs clients.

That'd solve this immediate problem without requiring an lsof upgrade as
well.

> This also doesn't
> solve the problem of slow traversal in the init_pid_ns -- only in a
> container.
>
> I also can't help but feel that /proc/locks is just showing its age. It
> was fine in the late 90's, but its limitations are just becoming more
> apparent as things get more complex. It was never designed for
> performance as you end up thrashing several spinlocks when reading it.
>
> Maybe it's time to think about presenting this info in another way? A
> global view of all locks on the system is interesting but maybe it
> would be better to present it more granularly somehow?

But, yes, that might be a good idea.

--b.

>
> I guess I should go look at what lsof actually does with this info...
>
> --
> Jeff Layton <jlayton@xxxxxxxxxxxxxxx>