Re: [git pull] more vfs bits

From: David Howells
Date: Sat Feb 21 2015 - 19:18:50 EST

Next message: mcdonaldloans: "Free Loan"
Previous message: Stephen Rothwell: "linux-next: manual merge of the mips tree with Linus' tree"
In reply to: David Howells: "Re: [git pull] more vfs bits"
Next in thread: Linus Torvalds: "Re: [git pull] more vfs bits"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> > Assorted stuff from this cycle. The big ones here are multilayer
> > overlayfs from Miklos and beginning of sorting ->d_inode accesses out from
> > David.
>
> So I've pulled this, but quite frankly, I think I will unpull it again
> unless you actually state *what* the advantages of this pointless
> series of endless patches are.

Let me describe the problem I'm trying to solve first.

Code that accessed ->d_inode may need to look at some other dentry/inode then
the dentry it is given when the filesystem it is operating on is a layer in
some sort of union, but it sees through the filter of the top union/overlay
layer.

Take overlayfs as an example. Regular files get dummy dentries and inodes
created in the top layer, but overlayfs has to pull a trick to go around the
VFS and repoint open files at the lower/source layer or the upper/workspace
layer rather than the overlay layer. Access attempts outside of open files
(eg. utime) get trapped by the dummy inode and dealt with there.

Now this brings me to a particular problem: file->f_path and file->f_inode
point to either the upper layer vfsmount, dentry and inode or the lower layer
vfsmount, dentry and inode. This means that anything that needs those
parameters does not get to see the fact that an overlay was involved. Such
things include security LSMs, /proc, notifications, locks and leases.

So what I want to do is:

(1) Introduce wrappers around accesses to ->d_inode. These can be used to
mark such accesses as being classified into two categories:

(A) Some accesses should only consider the dentry they're given.
Filesystems would fall into this category when dealing with their own
dentries and inodes.

(B) Other accesses should fall through from the dentry they're given to
an underlying layer. A lot of non-filesystem accesses, including
those in the VFS side of pathwalk, fall into this category, as do
accesses to external objects made by filesystems such as overlayfs
and ecryptfs.

Because there are a lot of ->d_inode calls, my intention is to classify
them by giving them an appropriate wrapper. Type (A) get wrapped with
fs_inode{,_once}() and type (B) get wrapped with dentry_inode{,_once}().

Wrapping them in this way makes it easier to find the cases that are more
problematic. SELinux, for example, might need to look at *both* layers
when assessing the security label to be levied upon an overlay object.

(2) Introduce wrappers around accesses to a given dentry where that dentry
might not be used directly but rather fall through (similar to (1B)
above).

(3) Start off with wrappers that simply pass dentry or dentry->d_inode
straight through.

(4) Add an extra pointer into struct dentry that can be pointed at a lower
layer - and will critically *pin* that lower layer.

This will avoid the need for file->f_path to directly pin the dentry to
which file->f_inode refers. This then allows file->f_path to point to
the top layer whilst file->f_inode points to an underlay file.

(6) Add a DCACHE_WHITEOUT_TYPE type variant for dentries that will in the
future be recognised by the VFS as a 'negative' type, but can be used by
the overlay or unionmount to note a difference to an ordinary negative
type dentry that can also be a fallthrough (at least in unionmount if
this ever makes it).

(7) Where feasible, replace if-statements that check the positivity or
negativity of dentries by doing if (...->d_inode) with checks on the type
of the dentry.

Also, where feasible, replace S_ISxxx checks on inode type with checks of
the dentry type field. This cuts down on the number of accesses to the
inode we need to do - which is good if we have to check indirectly.

To this end, I split DCACHE_FILE_TYPE to differentiate regular and
special types.

(5) Make the fallthrough logic work. You set DCACHE_FALLTHRU on a dentry and
this tells d_dentry() and dentry_inode() to bypass the given dentry and
use dentry->layer.lower and dentry->layer.lower->d_inode instead - but
does not affect the operation of fs_inode().

Note that the DCACHE_FALLTHRU flag and dentry->layer do not belong to the
filesystem that owns the dentry on which they are set, but rather belong
to the facility (be it overlayfs or unionmount) that is managing the
union.

So the patch series isn't complete as it stands. I'm trying to do the
wrapping first to reduce the number of ->d_inode accesses that need special
consideration.

> Now, that was true in the "bad old days" when we just used
> ACCESS_ONCE(dentry->d_inode) too, but at least in that old model we
> don't have some idiotic wrapper around it.

I can't make ACCESS_ONCE(fs_inode(dentry)) work if fs_inode() is an inline
function. I might be able to make it work if it's a macro. I also don't want
to call ACCESS_ONCE() inside fs_inode().

> Dammit, if we add wrapper and "helper" functions, they should *help*,
> not confuse. This thing is just badly named,

Suggest a better name that's not too long then please. fs_inode(d) is about
the same length as d->d_inode which means I can just script it and I don't
have to go in and reformat the code.

I did at one point have something longer for both wrappers - d_fs_own_inode()
and d_inode_or_lower(). I would quite like to have used d_inode(), but it
looks a bit too close to ->d_inode.

> so what the f*ck is the advantage of that helper wrt just making the rule be
> that "dentry->d_inode" is that unspecified thing.

I want checkpatch to ultimately throw up an error if anyone adds a new
->d_inode access. They can be instructed to use either fs_inode() or
dentry_inode().

Further, leaving dentry->d_inode as is all over the place makes it much harder
to find the stuff that needs to be considered carefully - there are over 2000
instances of ->d_inode in the code and grepping to find them is annoying with
so many false positives available.

However, a lot of the ->d_inode accesses can safely be scripted out of the
way, vastly cutting down the output of grep.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: mcdonaldloans: "Free Loan"
Previous message: Stephen Rothwell: "linux-next: manual merge of the mips tree with Linus' tree"
In reply to: David Howells: "Re: [git pull] more vfs bits"
Next in thread: Linus Torvalds: "Re: [git pull] more vfs bits"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]