Re: [PATCH 1/3] readfile: implement readfile syscall
From: Miklos Szeredi
Date: Sat Jul 04 2020 - 15:41:28 EST
On Sat, Jul 4, 2020 at 4:03 PM Greg Kroah-Hartman
<gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> It's a tiny syscall, meant to allow a user to do a single "open this
> file, read into this buffer, and close the file" all in a single shot.
>
> Should be good for reading "tiny" files like sysfs, procfs, and other
> "small" files.
>
> There is no restarting the syscall, this is a "simple" syscall, with the
> attempt to make reading "simple" files easier with less syscall
> overhead.
>
> Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> ---
> fs/open.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
>
> diff --git a/fs/open.c b/fs/open.c
> index 6cd48a61cda3..4469faa9379c 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -1370,3 +1370,53 @@ int stream_open(struct inode *inode, struct file *filp)
> }
>
> EXPORT_SYMBOL(stream_open);
> +
> +static struct file *readfile_open(int dfd, const char __user *filename,
> + struct open_flags *op)
> +{
> + struct filename *tmp;
> + struct file *f;
> +
> + tmp = getname(filename);
> + if (IS_ERR(tmp))
> + return (struct file *)tmp;
> +
> + f = do_filp_open(dfd, tmp, op);
> + if (!IS_ERR(f))
> + fsnotify_open(f);
> +
> + putname(tmp);
> + return f;
> +}
> +
> +SYSCALL_DEFINE5(readfile, int, dfd, const char __user *, filename,
> + char __user *, buffer, size_t, bufsize, int, flags)
> +{
> + struct open_flags op;
> + struct open_how how;
> + struct file *file;
> + loff_t pos = 0;
> + int retval;
> +
> + /* only accept a small subset of O_ flags that make sense */
> + if ((flags & (O_NOFOLLOW | O_NOATIME)) != flags)
> + return -EINVAL;
> +
> + /* add some needed flags to be able to open the file properly */
> + flags |= O_RDONLY | O_LARGEFILE;
> +
> + how = build_open_how(flags, 0000);
> + retval = build_open_flags(&how, &op);
> + if (retval)
> + return retval;
> +
> + file = readfile_open(dfd, filename, &op);
> + if (IS_ERR(file))
> + return PTR_ERR(file);
> +
> + retval = vfs_read(file, buffer, bufsize, &pos);
> +
> + filp_close(file, NULL);
> +
> + return retval;
Manpage says: "doing the sequence of open() and then read() and then
close()", which is exactly what it does.
But then it goes on to say: "If the file is larger than the value
provided in count then only count number of bytes will be copied into
buf", which is only half true, it should be: "If the file is larger
than the value provided in count then at most count number of bytes
will be copied into buf", which is not a lot of information.
And "If the size of file is smaller than the value provided in count
then the whole file will be copied into buf", which is simply a lie;
for example seq_file will happily return a smaller-than-PAGE_SIZE
chunk if at least one record fits in there. You'll have a very hard
time explaining that in the man page. So I think there are two
possible ways forward:
1) just leave the first explanation (it's an open + read + close
equivalent) and leave out the rest
2) add a loop around the vfs_read() in the code.
I'd strongly prefer #2 because with the non-looping read it's
impossible to detect whether the file was completely read or not, and
that's just going to lead to surprises and bugs in userspace code.
Thanks,
Miklos