Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Christian Brauner
Date: Tue Apr 28 2026 - 05:04:21 EST
On Mon, Apr 27, 2026 at 06:30:42PM +0200, Mateusz Guzik wrote:
> On Mon, Apr 27, 2026 at 5:14 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> >
> > > Things proceed to handle_truncate:
> > > int error = get_write_access(inode);
> > > if (error)
> > > return error;
> > >
> > > error = security_file_truncate(filp);
> > > if (!error) {
> > > error = do_truncate(idmap, path->dentry, 0,
> > > ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
> > > filp);
> > > }
> > >
> > > I'm going to ignore the LSM situation and do_truncate failure modes in this one.
> > >
> > > AFAICS nothing prevents the same user from racing against file creation to
> > > execve it, which starts with exe_file_deny_write_access. Should the
> > > other thread win the race, get_write_access will fail and the WARN_ON
> > > splat will be generated. That is definitely a problem.
> >
> > That can't happen:
> >
> > static inline int get_write_access(struct inode *inode)
> > {
> > return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
> > }
> >
> > and the check is:
> >
> > error = handle_truncate(idmap, file);
> > if (unlikely(error > 0)) {
> >
> > This was a catch all for broken LSM hook or ->open() instance.
> >
>
> So with this prog:
> #include <fcntl.h>
>
> int main(void)
> {
> open("test", O_TRUNC);
> }
>
> I verified writecount is 0 on entry to handle_truncate like so:
>
> bpftrace -e 'kprobe:security_file_truncate { @[comm, (int64)((struct
> file *)arg0)->f_path.dentry->d_inode->i_writecount.counter] = count();
> }'
>
> @[a.out, 1]: 1
>
> i.e., get_write_access in handle_truncate transitioned the count 0 -> 1
>
> but then what prevents the following race:
>
> CPU0 CPU1
> open("test") execve("test")
> handle_truncate do_open_execat
> exe_file_deny_write_access # should
> succeed as count is 0?
> get_write_access # should fail as the count is now -1?
I'm not arguing that get_write_access() cannot fail. I'm arguing that it
cannot hit that WARN_ON() as you said above because get_write_access()
returns either 0 or -ETXTBUSY.