Re: [PATCH v7 1/8] fs: introduce kernel_pread_file* support
From: Mimi Zohar
Date: Mon Jun 08 2020 - 09:04:24 EST
On Sat, 2020-06-06 at 08:52 -0700, Matthew Wilcox wrote:
> On Fri, Jun 05, 2020 at 10:04:51PM -0700, Scott Branden wrote:
> > -int kernel_read_file(struct file *file, void **buf, loff_t *size,
> > - loff_t max_size, enum kernel_read_file_id id)
> > -{
> > - loff_t i_size, pos;
> > +int kernel_pread_file(struct file *file, void **buf, loff_t *size,
> > + loff_t pos, loff_t max_size,
> > + enum kernel_pread_opt opt,
> > + enum kernel_read_file_id id)
> > +{
> > + loff_t alloc_size;
> > + loff_t buf_pos;
> > + loff_t read_end;
> > + loff_t i_size;
> > ssize_t bytes = 0;
> > int ret;
> >
>
> Look, it's not your fault, but this is a great example of how we end
> up with atrocious interfaces. Someone comes along and implements a
> simple DWIM interface that solves their problem. Then somebody else
> adds a slight variant that solves their problem, and so on and so on,
> and we end up with this bonkers API where the arguments literally change
> meaning depending on other arguments.
>
> > @@ -950,21 +955,31 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
> > ret = -EINVAL;
> > goto out;
> > }
> > - if (i_size > SIZE_MAX || (max_size > 0 && i_size > max_size)) {
> > +
> > + /* Default read to end of file */
> > + read_end = i_size;
> > +
> > + /* Allow reading partial portion of file */
> > + if ((opt == KERNEL_PREAD_PART) &&
> > + (i_size > (pos + max_size)))
> > + read_end = pos + max_size;
> > +
> > + alloc_size = read_end - pos;
> > + if (i_size > SIZE_MAX || (max_size > 0 && alloc_size > max_size)) {
> > ret = -EFBIG;
> > goto out;
>
> ... like that.
>
> I think what we actually want is:
>
> ssize_t vmap_file_range(struct file *, loff_t start, loff_t end, void **bufp);
> void vunmap_file_range(struct file *, void *buf);
>
> If end > i_size, limit the allocation to i_size. Returns the number
> of bytes allocated, or a negative errno. Writes the pointer allocated
> to *bufp. Internally, it should use the page cache to read in the pages
> (taking appropriate reference counts). Then it maps them using vmap()
> instead of copying them to a private vmalloc() array.
>
> kernel_read_file() can be converted to use this API. The users will
> need to be changed to call kernel_read_end(struct file *file, void *buf)
> instead of vfree() so it can call allow_write_access() for them.
>
> vmap_file_range() has a lot of potential uses. I'm surprised we don't
> have it already, to be honest.
Prior to kernel_read_file() the same or verify similar code existed in
multiple places in the kernel. ÂThe kernel_read_file() API
consolidated the existing code adding the pre and post security hooks.
With this new design of not using a private vmalloc, will the file
data be accessible prior to the post security hooks? ÂFrom an IMA
perspective, the hooks are used for measuring and/or verifying the
integrity of the file.
Mimi