Re: kernel crash in mknod
From: Al Viro
Date: Mon Mar 25 2024 - 17:13:31 EST
On Mon, Mar 25, 2024 at 05:47:16PM -0300, Paulo Alcantara wrote:
> Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:
>
> > On Mon, Mar 25, 2024 at 11:26:59AM -0500, Steve French wrote:
> >
> >> A loosely related question. Do I need to change cifs.ko to return the
> >> pointer to inode on mknod now? dentry->inode is NULL in the case of mknod
> >> from cifs.ko (and presumably some other fs as Al noted), unlike mkdir and
> >> create where it is filled in. Is there a perf advantage in filling in the
> >> dentry->inode in the mknod path in the fs or better to leave it as is? Is
> >> there a good example to borrow from on this?
> >
> > AFAICS, that case in in CIFS is the only instance of ->mknod() that does this
> > "skip lookups, just unhash and return 0" at the moment.
> >
> > What's more, it really had been broken all along for one important case -
> > AF_UNIX bind(2) with address (== socket pathname) being on the filesystem
> > in question.
>
> Yes, except that we currently return -EPERM for such cases. I don't
> even know if this SFU thing supports sockets.
Sure, but we really want the rules to be reasonably simple and
"you may leave dentry unhashed negative and return 0, provided that you
hadn't been asked to create a socket" is anything but ;-)
> > Note that cifs_sfu_make_node() is the only case in CIFS where that happens -
> > other codepaths (both in cifs_make_node() and in smb2_make_node()) will
> > instantiate. How painful would it be for cifs_sfu_make_node()?
> > AFAICS, you do open/sync_write/close there; would it be hard to do
> > an eqiuvalent of fstat and set the inode up?
>
> This should be pretty straightforward as it would only require an extra
> query info call and then {smb311_posix,cifs}_get_inode_info() ->
> d_instantiate(). We could even make it a single compound request of
> open/write/getinfo/close for SMB2+ case.
If that's the case, I believe that we should simply declare that
->mknod() must instantiate on success and have vfs_mknod() check and
warn if it hadn't.
Rationale:
1) mknod(2) is usually followed by at least some access to created object.
Not setting the inode up won't save much anyway.
2) if some instance of ->mknod() skips setting the inode on success (i.e.
unhashes the still-negative dentry and returns 0), it can easily be
converted. The minimal conversion would be along the lines of turning
d_drop(dentry);
return 0;
into
d_drop(dentry);
d = foofs_lookup(dir, dentry, 0);
if (unlikely(d)) {
if (!IS_ERR(d)) {
dput(d);
return -EINVAL; // weird shit - directory got created somehow
}
return PTR_ERR(d);
}
return 0;
but there almost certainly are cheaper ways to get the inode metadata,
set the inode up and instantiate the dentry.
3) currently only on in-kernel instance is that way.
4) it makes life simpler for the users of vfs_mknod().
Objections, anyone?