[PATCH] new syscall: flink

From: Ulrich Drepper (drepper@redhat.com)
Date: Sun Apr 06 2003 - 13:39:38 EST


I got a couple of requests for a function which isn't support on Linux
so far. Also not supportable, i.e., cannot be emulated at userlevel.
It has some history in other systems (QNX I think), though, and helps
with some security issues. It really not adding much new functionality
and I hope I got it right with my "monkey see, monkey do" technique of
looking up other places doing similar things.

The syscall I mean is

  int flink (int fd, const char *newname)

Similar to link(), but the first parameter is a file decsriptor. Using
the file descriptor helps to avoid races in some situation. Look at
this code bit (this is constructed and just a little test case):

#include <errno.h>
#define EE(a,x) {int e = a;if (e!=x)printf("%s = %d (%m)\n", #a, errno);}
#define E(a) EE(a,0)
int
main()
{
  printf("uid = %d, euid = %d, gid = %d, egid = %d\n", getuid(),
geteuid(), getgid(), getegid());
  E(setfsuid(getuid()));
  E(setfsgid(getgid()));
  char buf[] = "aaXXXXXX";
  int fd = mkstemp (buf);
  EE(setfsuid(0),getuid());
  EE(setfsgid(0),getgid());
  E(fchown(fd,48,48));
  E(setfsuid(getuid()));
  E(setfsgid(getgid()));
  E(syscall(268,fd,"aa"));
  E(unlink(buf));
  return 0;
}

This is for a SUID/SGID root application. A temporary file is created
carefully using the permissions of the user running the program. If the
syscall 268 (flink) line would be

  link(buf,"aa)

instead, somebody with limited priviledges could in theory have unlinked
the temporary file and created a new one. With flink() this isn't
possible. And the best is: the unlink() line can be moved right below
the mkstemp() call. This means no temporary files are around if
something goes wrong before the unlink(). (For me this is at least as
important as the security issue, it simplifies temp file handling and
reduces the number of bugs == leftover fiels).

The patch itself is very minimal. The impact on the link() syscall is
not measurable and the extra code for sys_flink is only a few bytes.

Is this acceptable? Shall I add more syscall definitions for platforms
!= x86 (I'd think the port maintainers want to do this themselves)?

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------

--- linux-2.5/arch/i386/kernel/entry.S-flink 2003-03-21 23:09:32.000000000 -0800 +++ linux-2.5/arch/i386/kernel/entry.S 2003-04-06 10:32:27.000000000 -0700 @@ -852,6 +852,7 @@ ENTRY(sys_call_table) .long sys_clock_gettime /* 265 */ .long sys_clock_getres .long sys_clock_nanosleep + .long sys_flink nr_syscalls=(.-sys_call_table)/4 --- linux-2.5/fs/namei.c-flink 2003-04-03 10:04:03.000000000 -0800 +++ linux-2.5/fs/namei.c 2003-04-06 11:20:41.000000000 -0700 @@ -16,6 +16,7 @@ #include <linux/init.h> #include <linux/slab.h> +#include <linux/file.h> #include <linux/fs.h> #include <linux/namei.h> #include <linux/quotaops.h> @@ -1796,10 +1797,10 @@ int vfs_link(struct dentry *old_dentry, * with linux 2.0, and to avoid hard-linking to directories * and other special files. --ADM */ -asmlinkage long sys_link(const char * oldname, const char * newname) +static long link_common(struct vfsmount *old_mnt, struct dentry *old_dentry, const char *newname) { struct dentry *new_dentry; - struct nameidata nd, old_nd; + struct nameidata nd; int error; char * to; @@ -1807,32 +1808,53 @@ asmlinkage long sys_link(const char * ol if (IS_ERR(to)) return PTR_ERR(to); - error = __user_walk(oldname, 0, &old_nd); - if (error) - goto exit; error = path_lookup(to, LOOKUP_PARENT, &nd); if (error) - goto out; + goto exit; error = -EXDEV; - if (old_nd.mnt != nd.mnt) + if (old_mnt != nd.mnt) goto out_release; new_dentry = lookup_create(&nd, 0); error = PTR_ERR(new_dentry); if (!IS_ERR(new_dentry)) { - error = vfs_link(old_nd.dentry, nd.dentry->d_inode, new_dentry); + error = vfs_link(old_dentry, nd.dentry->d_inode, new_dentry); dput(new_dentry); } up(&nd.dentry->d_inode->i_sem); out_release: path_release(&nd); -out: - path_release(&old_nd); exit: putname(to); return error; } +asmlinkage long sys_link(const char *oldname, const char *newname) +{ + struct nameidata old_nd; + int error; + + error = __user_walk(oldname, 0, &old_nd); + if (!error) { + error = link_common(old_nd.mnt, old_nd.dentry, newname); + path_release(&old_nd); + } + return error; +} + +asmlinkage long sys_flink(unsigned int fd, const char *newname) +{ + struct file *file; + int error = -EBADF; + + file = fget(fd); + if (file) { + error = link_common(file->f_vfsmnt, file->f_dentry, newname); + fput(file); + } + return error; +} + /* * The worst of all namespace operations - renaming directory. "Perverted" * doesn't even start to describe it. Somebody in UCB had a heck of a trip... --- linux-2.5/include/asm-i386/unistd.h-flink 2003-02-19 21:41:59.000000000 -0800 +++ linux-2.5/include/asm-i386/unistd.h 2003-04-06 10:30:08.000000000 -0700 @@ -273,8 +273,9 @@ #define __NR_clock_gettime (__NR_timer_create+6) #define __NR_clock_getres (__NR_timer_create+7) #define __NR_clock_nanosleep (__NR_timer_create+8) +#define __NR_flink 268 -#define NR_syscalls 268 +#define NR_syscalls 269 /* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Apr 07 2003 - 22:00:28 EST