Re: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

From: Pavel Shilovsky
Date: Tue Feb 04 2014 - 07:03:30 EST


2014-02-01 Jeff Layton <jlayton@xxxxxxxxxx>:
> On Fri, 17 Jan 2014 14:07:06 +0400
> Pavel Shilovsky <piastry@xxxxxxxxxxx> wrote:
>
>> This patch adds 3 flags:
>> 1) O_DENYREAD that doesn't permit read access,
>> 2) O_DENYWRITE that doesn't permit write access,
>> 3) O_DENYDELETE that doesn't permit delete or rename.
>>
>> Network filesystems CIFS, SMB2.0, SMB3.0 and NFSv4 have such flags -
>> this change can benefit cifs and nfs modules as well as Samba and
>> NFS file servers that export the same directory for Windows clients,
>> or Wine applications that access the same files simultaneously.
>>
>> These flags are only take affect for opens on mounts with new sharelock
>> option. They are translated to flock's flags:
>>
>> !O_DENYREAD -> LOCK_READ | LOCK_MAND
>> !O_DENYWRITE -> LOCK_WRITE | LOCK_MAND
>>
>> and set through flock_lock_file on a file. If the file can't be locked
>> due conflicts with another open with O_DENY* flags, a new -ESHAREDENIED
>> error code is returned.
>>
>> Create codepath is slightly changed to prevent data races on newly
>> created files: when open with O_CREAT can return -ESHAREDENIED error
>> for successfully created files due to a sharelock set by another task.
>>
>> Temporary disable O_DENYDELETE support - will enable it in further
>> patches.
>>
>> Signed-off-by: Pavel Shilovsky <piastry@xxxxxxxxxxx>
>> ---
>> arch/alpha/include/uapi/asm/errno.h | 2 +
>> arch/alpha/include/uapi/asm/fcntl.h | 3 ++
>> arch/mips/include/uapi/asm/errno.h | 2 +
>> arch/parisc/include/uapi/asm/errno.h | 2 +
>> arch/parisc/include/uapi/asm/fcntl.h | 3 ++
>> arch/sparc/include/uapi/asm/errno.h | 2 +
>> arch/sparc/include/uapi/asm/fcntl.h | 3 ++
>> fs/fcntl.c | 5 +-
>> fs/locks.c | 97 +++++++++++++++++++++++++++++++---
>> fs/namei.c | 53 ++++++++++++++++++-
>> fs/proc_namespace.c | 1 +
>> include/linux/fs.h | 8 +++
>> include/uapi/asm-generic/errno.h | 2 +
>> include/uapi/asm-generic/fcntl.h | 11 ++++
>> include/uapi/linux/fs.h | 1 +
>> 15 files changed, 185 insertions(+), 10 deletions(-)
>>
>
> You might consider breaking this patch into two. One patch that makes
> LOCK_MAND locks actually work and that adds MS_SHARELOCK, and one patch
> that hooks that up to open(). Given the locking involved with the
> i_mutex it would be best to present this as a series of small,
> incremental changes.

Good point. So, we can break it into 2:
1) make flock actually work with LOCK_MAND on MS_SHARELOCK mounts,
2) replace flock+LOCK_MAND with open+O_DENY* flags.


>> diff --git a/arch/alpha/include/uapi/asm/errno.h b/arch/alpha/include/uapi/asm/errno.h
>> index 17f92aa..953a6d6 100644
>> --- a/arch/alpha/include/uapi/asm/errno.h
>> +++ b/arch/alpha/include/uapi/asm/errno.h
>> @@ -124,4 +124,6 @@
>>
>> #define EHWPOISON 139 /* Memory page has hardware error */
>>
>> +#define ESHAREDENIED 140 /* File is locked with a sharelock */
>> +
>> #endif
>> diff --git a/arch/alpha/include/uapi/asm/fcntl.h b/arch/alpha/include/uapi/asm/fcntl.h
>> index 09f49a6..265344b 100644
>> --- a/arch/alpha/include/uapi/asm/fcntl.h
>> +++ b/arch/alpha/include/uapi/asm/fcntl.h
>> @@ -33,6 +33,9 @@
>>
>> #define O_PATH 040000000
>> #define __O_TMPFILE 0100000000
>> +#define O_DENYREAD 0200000000 /* Do not permit read access */
>> +#define O_DENYWRITE 0400000000 /* Do not permit write access */
>> +#define O_DENYDELETE 01000000000 /* Do not permit delete or rename */
>>
>> #define F_GETLK 7
>> #define F_SETLK 8
>> diff --git a/arch/mips/include/uapi/asm/errno.h b/arch/mips/include/uapi/asm/errno.h
>> index 02d645d..f1a4068 100644
>> --- a/arch/mips/include/uapi/asm/errno.h
>> +++ b/arch/mips/include/uapi/asm/errno.h
>> @@ -123,6 +123,8 @@
>>
>> #define EHWPOISON 168 /* Memory page has hardware error */
>>
>> +#define ESHAREDENIED 169 /* File is locked with a sharelock */
>> +
>> #define EDQUOT 1133 /* Quota exceeded */
>>
>>
>> diff --git a/arch/parisc/include/uapi/asm/errno.h b/arch/parisc/include/uapi/asm/errno.h
>> index f3a8aa5..654c232 100644
>> --- a/arch/parisc/include/uapi/asm/errno.h
>> +++ b/arch/parisc/include/uapi/asm/errno.h
>> @@ -124,4 +124,6 @@
>>
>> #define EHWPOISON 257 /* Memory page has hardware error */
>>
>> +#define ESHAREDENIED 258 /* File is locked with a sharelock */
>> +
>> #endif
>> diff --git a/arch/parisc/include/uapi/asm/fcntl.h b/arch/parisc/include/uapi/asm/fcntl.h
>> index 34a46cb..5865964 100644
>> --- a/arch/parisc/include/uapi/asm/fcntl.h
>> +++ b/arch/parisc/include/uapi/asm/fcntl.h
>> @@ -21,6 +21,9 @@
>>
>> #define O_PATH 020000000
>> #define __O_TMPFILE 040000000
>> +#define O_DENYREAD 0200000000 /* Do not permit read access */
>> +#define O_DENYWRITE 0400000000 /* Do not permit write access */
>> +#define O_DENYDELETE 01000000000 /* Do not permit delete or rename */
>>
>> #define F_GETLK64 8
>> #define F_SETLK64 9
>> diff --git a/arch/sparc/include/uapi/asm/errno.h b/arch/sparc/include/uapi/asm/errno.h
>> index 20423e17..fe339b5 100644
>> --- a/arch/sparc/include/uapi/asm/errno.h
>> +++ b/arch/sparc/include/uapi/asm/errno.h
>> @@ -114,4 +114,6 @@
>>
>> #define EHWPOISON 135 /* Memory page has hardware error */
>>
>> +#define ESHAREDENIED 136 /* File is locked with a sharelock */
>> +
>> #endif
>> diff --git a/arch/sparc/include/uapi/asm/fcntl.h b/arch/sparc/include/uapi/asm/fcntl.h
>> index 7e8ace5..ab68170 100644
>> --- a/arch/sparc/include/uapi/asm/fcntl.h
>> +++ b/arch/sparc/include/uapi/asm/fcntl.h
>> @@ -36,6 +36,9 @@
>>
>> #define O_PATH 0x1000000
>> #define __O_TMPFILE 0x2000000
>> +#define O_DENYREAD 0x4000000 /* Do not permit read access */
>> +#define O_DENYWRITE 0x8000000 /* Do not permit write access */
>> +#define O_DENYDELETE 0x10000000 /* Do not permit delete or rename */
>>
>
> It'd probably be best to add O_DENYDELETE in a separate patch, rather
> than disabling it temporarily.

Agree.

>
>> #define F_GETOWN 5 /* for sockets. */
>> #define F_SETOWN 6 /* for sockets. */
>> diff --git a/fs/fcntl.c b/fs/fcntl.c
>> index ef68665..3f85887 100644
>> --- a/fs/fcntl.c
>> +++ b/fs/fcntl.c
>> @@ -729,14 +729,15 @@ static int __init fcntl_init(void)
>> * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY
>> * is defined as O_NONBLOCK on some platforms and not on others.
>> */
>> - BUILD_BUG_ON(20 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
>> + BUILD_BUG_ON(23 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
>> O_RDONLY | O_WRONLY | O_RDWR |
>> O_CREAT | O_EXCL | O_NOCTTY |
>> O_TRUNC | O_APPEND | /* O_NONBLOCK | */
>> __O_SYNC | O_DSYNC | FASYNC |
>> O_DIRECT | O_LARGEFILE | O_DIRECTORY |
>> O_NOFOLLOW | O_NOATIME | O_CLOEXEC |
>> - __FMODE_EXEC | O_PATH | __O_TMPFILE
>> + __FMODE_EXEC | O_PATH | __O_TMPFILE |
>> + O_DENYREAD | O_DENYWRITE | O_DENYDELETE
>> ));
>>
>> fasync_cache = kmem_cache_create("fasync_cache",
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 92a0f0a..ffde4d4 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -708,20 +708,73 @@ static int posix_locks_conflict(struct file_lock *caller_fl, struct file_lock *s
>> return (locks_conflict(caller_fl, sys_fl));
>> }
>>
>> -/* Determine if lock sys_fl blocks lock caller_fl. FLOCK specific
>> - * checking before calling the locks_conflict().
>> +static unsigned int
>> +deny_flags_to_cmd(unsigned int flags)
>> +{
>> + unsigned int cmd = LOCK_MAND;
>> +
>> + if (!(flags & O_DENYREAD))
>> + cmd |= LOCK_READ;
>> + if (!(flags & O_DENYWRITE))
>> + cmd |= LOCK_WRITE;
>> +
>> + return cmd;
>> +}
>> +
>> +/*
>> + * locks_mand_conflict - Determine if there's a share reservation conflict
>> + * @caller_fl: lock we're attempting to acquire
>> + * @sys_fl: lock already present on system that we're checking against
>> + *
>> + * Check to see if there's a share_reservation conflict. LOCK_READ/LOCK_WRITE
>> + * tell us whether the reservation allows other readers and writers.
>> + */
>> +static int
>> +locks_mand_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
>> +{
>> + unsigned char caller_type = caller_fl->fl_type;
>> + unsigned char sys_type = sys_fl->fl_type;
>> + fmode_t caller_fmode = caller_fl->fl_file->f_mode;
>> + fmode_t sys_fmode = sys_fl->fl_file->f_mode;
>> +
>> + /* they can only conflict if FS is mounted with MS_SHARELOCK */
>> + if (!IS_SHARELOCK(caller_fl->fl_file->f_path.dentry->d_inode))
>> + return 0;
>> +
>> + /* they can only conflict if they're both LOCK_MAND */
>> + if (!(caller_type & LOCK_MAND) || !(sys_type & LOCK_MAND))
>> + return 0;
>> +
>> + if (!(caller_type & LOCK_READ) && (sys_fmode & FMODE_READ))
>> + return 1;
>> + if (!(caller_type & LOCK_WRITE) && (sys_fmode & FMODE_WRITE))
>> + return 1;
>> + if (!(sys_type & LOCK_READ) && (caller_fmode & FMODE_READ))
>> + return 1;
>> + if (!(sys_type & LOCK_WRITE) && (caller_fmode & FMODE_WRITE))
>> + return 1;
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * Determine if lock sys_fl blocks lock caller_fl. FLOCK specific checking
>> + * before calling the locks_conflict().
>> */
>> static int flock_locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
>> {
>> - /* FLOCK locks referring to the same filp do not conflict with
>> + if (!IS_FLOCK(sys_fl))
>> + return 0;
>> + if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND))
>> + return locks_mand_conflict(caller_fl, sys_fl);
>
> nit: Seems like the above could be optimized a little. You know that
> locks_mand_conflict is only relevant if both are LOCK_MAND, and one of
> the first things that locks_mand_conflict does is to check that both
> have that set.

ok.

>
>> + /*
>> + * FLOCK locks referring to the same filp do not conflict with
>> * each other.
>> */
>> - if (!IS_FLOCK(sys_fl) || (caller_fl->fl_file == sys_fl->fl_file))
>> - return (0);
>> - if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND))
>> + if (caller_fl->fl_file == sys_fl->fl_file)
>> return 0;
>>
>> - return (locks_conflict(caller_fl, sys_fl));
>> + return locks_conflict(caller_fl, sys_fl);
>> }
>>
>> void
>> @@ -888,6 +941,36 @@ out:
>> return error;
>> }
>>
>> +/*
>> + * Determine if a file is allowed to be opened with specified access and share
>> + * modes. Lock the file and return 0 if checks passed, otherwise return
>> + * -ESHAREDENIED.
>> + */
>> +int
>> +sharelock_lock_file(struct file *filp)
>> +{
>> + struct file_lock *lock;
>> + int error = 0;
>> +
>> + if (!IS_SHARELOCK(filp->f_path.dentry->d_inode))
>> + return error;
>> +
>> + /* Disable O_DENYDELETE support for now */
>> + if (filp->f_flags & O_DENYDELETE)
>> + return -EINVAL;
>> +
>> + error = flock_make_lock(filp, &lock, deny_flags_to_cmd(filp->f_flags));
>> + if (error)
>> + return error;
>> +
>> + error = flock_lock_file(filp, lock);
>> + if (error == -EAGAIN)
>> + error = -ESHAREDENIED;
>> +
>> + locks_free_lock(lock);
>> + return error;
>> +}
>> +
>> static int __posix_lock_file(struct inode *inode, struct file_lock *request, struct file_lock *conflock)
>> {
>> struct file_lock *fl;
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 3531dee..2b741a1 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -2725,9 +2725,14 @@ static int atomic_open(struct nameidata *nd, struct dentry *dentry,
>> acc_mode = MAY_OPEN;
>> }
>> error = may_open(&file->f_path, acc_mode, open_flag);
>> - if (error)
>> + if (error) {
>> fput(file);
>> + goto out;
>> + }
>>
>> + error = sharelock_lock_file(file);
>> + if (error)
>> + fput(file);
>> out:
>> dput(dentry);
>> return error;
>> @@ -2919,6 +2924,40 @@ retry_lookup:
>> }
>> mutex_lock(&dir->d_inode->i_mutex);
>> error = lookup_open(nd, path, file, op, got_write, opened);
>> +
>> + /*
>> + * For sharelock mounts if a file was created but not opened, we need
>> + * to keep parent i_mutex until we finish the open to prevent races when
>> + * somebody opens newly created by us file and locks it with a sharelock
>> + * before we open it.
>> + */
>> + if (IS_SHARELOCK(dir->d_inode) && error > 0 && *opened & FILE_CREATED) {
>> + /* Don't check for write permission, don't truncate */
>> + open_flag &= ~O_TRUNC;
>> + will_truncate = false;
>> + acc_mode = MAY_OPEN;
>> + path_to_nameidata(path, nd);
>> +
>> + error = may_open(&nd->path, acc_mode, open_flag);
>> + if (error) {
>> + mutex_unlock(&dir->d_inode->i_mutex);
>> + goto out;
>> + }
>> + file->f_path.mnt = nd->path.mnt;
>> + error = finish_open(file, nd->path.dentry, NULL, opened);
>> + if (error) {
>> + mutex_unlock(&dir->d_inode->i_mutex);
>> + if (error == -EOPENSTALE)
>> + goto stale_open;
>> + goto out;
>> + }
>> + error = sharelock_lock_file(file);
>> + mutex_unlock(&dir->d_inode->i_mutex);
>> + if (error)
>> + goto exit_fput;
>> + goto opened;
>> + }
>> +
>> mutex_unlock(&dir->d_inode->i_mutex);
>>
>> if (error <= 0) {
>> @@ -3034,6 +3073,18 @@ finish_open_created:
>> goto stale_open;
>> goto out;
>> }
>> +
>> + if (IS_SHARELOCK(dir->d_inode)) {
>> + /*
>> + * Lock parent i_mutex to prevent races with sharelocks on
>> + * newly created files.
>> + */
>> + mutex_lock(&dir->d_inode->i_mutex);
>> + error = sharelock_lock_file(file);
>> + mutex_unlock(&dir->d_inode->i_mutex);
>> + if (error)
>> + goto exit_fput;
>> + }
>> opened:
>> error = open_check_o_direct(file);
>> if (error)
>> diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
>> index 439406e..dd374d4 100644
>> --- a/fs/proc_namespace.c
>> +++ b/fs/proc_namespace.c
>> @@ -44,6 +44,7 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb)
>> { MS_SYNCHRONOUS, ",sync" },
>> { MS_DIRSYNC, ",dirsync" },
>> { MS_MANDLOCK, ",mand" },
>> + { MS_SHARELOCK, ",sharelock" },
>> { 0, NULL }
>> };
>> const struct proc_fs_info *fs_infop;
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 121f11f..aa061ca 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -1029,6 +1029,7 @@ extern int vfs_setlease(struct file *, long, struct file_lock **);
>> extern int lease_modify(struct file_lock **, int);
>> extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
>> extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
>> +extern int sharelock_lock_file(struct file *);
>> #else /* !CONFIG_FILE_LOCKING */
>> static inline int fcntl_getlk(struct file *file, struct flock __user *user)
>> {
>> @@ -1169,6 +1170,12 @@ static inline int lock_may_write(struct inode *inode, loff_t start,
>> {
>> return 1;
>> }
>> +
>> +static inline int sharelock_lock_file(struct file *filp)
>> +{
>> + return 0;
>> +}
>> +
>> #endif /* !CONFIG_FILE_LOCKING */
>>
>>
>> @@ -1675,6 +1682,7 @@ struct super_operations {
>> #define IS_PRIVATE(inode) ((inode)->i_flags & S_PRIVATE)
>> #define IS_IMA(inode) ((inode)->i_flags & S_IMA)
>> #define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT)
>> +#define IS_SHARELOCK(inode) __IS_FLG(inode, MS_SHARELOCK)
>> #define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC)
>>
>> /*
>> diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
>> index 1e1ea6e..aff869c 100644
>> --- a/include/uapi/asm-generic/errno.h
>> +++ b/include/uapi/asm-generic/errno.h
>> @@ -110,4 +110,6 @@
>>
>> #define EHWPOISON 133 /* Memory page has hardware error */
>>
>> +#define ESHAREDENIED 134 /* File is locked with a sharelock */
>> +
>> #endif
>> diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
>> index 95e46c8..9881cfe 100644
>> --- a/include/uapi/asm-generic/fcntl.h
>> +++ b/include/uapi/asm-generic/fcntl.h
>> @@ -92,6 +92,17 @@
>> #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
>> #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)
>>
>> +#ifndef O_DENYREAD
>> +#define O_DENYREAD 040000000 /* Do not permit read access */
>> +#endif
>> +/* FMODE_NONOTIFY 0100000000 */
>> +#ifndef O_DENYWRITE
>> +#define O_DENYWRITE 0200000000 /* Do not permit write access */
>> +#endif
>> +#ifndef O_DENYDELETE
>> +#define O_DENYDELETE 0400000000 /* Do not permit delete or rename */
>> +#endif
>> +
>
> One thing to consider: We found with the addition of O_TMPFILE that the
> open() api is not particularly helpful when it comes to informing
> appications when a flag isn't supported:
>
> http://lwn.net/Articles/562294/
>
> ...having a plan to cope with that here would be best. How can an
> application determine at runtime that O_DENY* actually *work*? It may
> be best to step back and consider a new syscall for this (open2() ?).
>

So, consider we added new syscall:

opendm(filename, flags, mode, deny_mode)
{
return open(filename, flags | denymode2openflags(deny_mode), mode)
}

where deny_mode can be DMODE_NONE (0), DMODE_READ (1), DMODE_WRITE(2)
and DMODE_RDWR(3) (similar to FMODE_* values).

We have open and opendm that act actually in the same manner for
mounts without MS_SHARELOCK. For mounts with MS_SHARELOCK open is like
opendm with DMODE_NONE. Open flags O_DENY* are for internal use only.

Is it what you suggest?

--
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/