Re: [PATCH 4/4] pidfs: implement fh_to_dentry

From: Christian Brauner
Date: Wed Nov 13 2024 - 08:26:40 EST


On Wed, Nov 13, 2024 at 02:06:56PM +0100, Erin Shepherd wrote:
> On 13/11/2024 13:09, Christian Brauner wrote:
>
> > Hm, a pidfd comes in two flavours:
> >
> > (1) thread-group leader pidfd: pidfd_open(<pid>, 0)
> > (2) thread pidfd: pidfd_open(<pid>, PIDFD_THREAD)
> >
> > In your current scheme fid->pid = pid_nr(pid) means that you always
> > encode a pidfs file handle for a thread pidfd no matter if the provided
> > pidfd was a thread-group leader pidfd or a thread pidfd. This is very
> > likely wrong as it means users that use a thread-group pidfd get a
> > thread-specific pid back.
> >
> > I think we need to encode (1) and (2) in the pidfs file handle so users
> > always get back the correct type of pidfd.
> >
> > That very likely means name_to_handle_at() needs to encode this into the
> > pidfs file handle.
>
> I guess a question here is whether a pidfd handle encodes a handle to a pid
> in a specific mode, or just to a pid in general? The thought had occurred
> to me while I was working on this initially, but I felt like perhaps treating
> it as a property of the file descriptor in general was better.
>
> Currently open_by_handle_at always returns a thread-group pidfd (since
> PIDFD_THREAD) isn't set, regardless of what type of pidfd you passed to
> name_to_handle_at. I had thought that PIDFD_THREAD/O_EXCL would have been

I don't think you're returning a thread-groupd pidfd from
open_by_handle_at() in your scheme. After all you're encoding the tid in
pid_nr() so you'll always find the struct pid for the thread afaict. If
I'm wrong could you please explain how you think this works? I might
just be missing something obvious.

> passed through to f->f_flags on the restored pidfd, but upon checking I see that
> it gets filtered out in do_dentry_open.

It does, but note that __pidfd_prepare() raises it explicitly on the
file afterwards. So it works fine.

>
> I feel like leaving it up to the caller of open_by_handle_at might be better
> (because they are probably better informed about whether they want poll() to
> inform them of thread or process exit) but I could lean either way.

So in order to decode a pidfs file handle you want the caller to have to
specify O_EXCL in the flags argument of open_by_handle_at()? Is that
your idea?

>
> >> +static struct dentry *pidfs_fh_to_dentry(struct super_block *sb,
> >> + struct fid *gen_fid,
> >> + int fh_len, int fh_type)
> >> +{
> >> + int ret;
> >> + struct path path;
> >> + struct pidfd_fid *fid = (struct pidfd_fid *)gen_fid;
> >> + struct pid *pid;
> >> +
> >> + if (fh_type != FILEID_INO64_GEN || fh_len < PIDFD_FID_LEN)
> >> + return NULL;
> >> +
> >> + pid = find_get_pid_ns(fid->pid, &init_pid_ns);
> >> + if (!pid || pid->ino != fid->ino || pid_vnr(pid) == 0) {
> >> + put_pid(pid);
> >> + return NULL;
> >> + }
> > I think we can avoid the premature reference bump and do:
> >
> > scoped_guard(rcu) {
> > struct pid *pid;
> >
> > pid = find_pid_ns(fid->pid, &init_pid_ns);
> > if (!pid)
> > return NULL;
> >
> > /* Did the pid get recycled? */
> > if (pid->ino != fid->ino)
> > return NULL;
> >
> > /* Must be resolvable in the caller's pid namespace. */
> > if (pid_vnr(pid) == 0)
> > return NULL;
> >
> > /* Ok, this is the pid we want. */
> > get_pid(pid);
> > }
>
> I can go with that if preferred. I was worried a bit about making the RCU
> critical section too large, but of course I'm sure there are much larger
> sections inside the kernel.

This is perfectly fine. Don't worry about it.

>
> >> +
> >> + ret = path_from_stashed(&pid->stashed, pidfs_mnt, pid, &path);
> >> + if (ret < 0)
> >> + return ERR_PTR(ret);
> >> +
> >> + mntput(path.mnt);
> >> + return path.dentry;
> >> }
>
> Similarly here i should probably refactor this into dentry_from_stashed in
> order to avoid a needless bump-then-drop of path.mnt's reference count

No, what you have now is fine. I wouldn't add a specific helper for
this. In contrast to the pid the pidfs mount never goes away.