Re: [PATCH v5] fs: clear file privilege bits when mmap writing

From: Kees Cook
Date: Thu Dec 10 2015 - 13:18:47 EST


On Thu, Dec 10, 2015 at 10:16 AM, Willy Tarreau <w@xxxxxx> wrote:
> On Thu, Dec 10, 2015 at 10:05:50AM -0800, Kees Cook wrote:
>> On Wed, Dec 9, 2015 at 11:06 PM, Willy Tarreau <w@xxxxxx> wrote:
>> > Hi Kees,
>> >
>> > Why not add a new file flag instead ?
>> >
>> > Something like this (editing your patch by hand to illustrate) :
>> >
>> > diff --git a/fs/file_table.c b/fs/file_table.c
>> > index ad17e05ebf95..3a7eee76ea90 100644
>> > --- a/fs/file_table.c
>> > +++ b/fs/file_table.c
>> > @@ -191,6 +191,17 @@ static void __fput(struct file *file)
>> >
>> > might_sleep();
>> >
>> > + /*
>> > + * XXX: While avoiding mmap_sem, we've already been written to.
>> > + * We must ignore the return value, since we can't reject the
>> > + * write.
>> > + */
>> > + if (unlikely(file->f_flags & FL_DROP_PRIVS)) {
>> > + mutex_lock(&inode->i_mutex);
>> > + file_remove_privs(file);
>> > + mutex_unlock(&inode->i_mutex);
>> > + }
>> > +
>> > fsnotify_close(file);
>> > /*
>> > * The function eventpoll_release() should be the first called
>> > diff --git a/include/linux/fs.h b/include/linux/fs.h
>> > index 3aa514254161..409bd7047e7e 100644
>> > --- a/include/linux/fs.h
>> > +++ b/include/linux/fs.h
>> > @@ -913,3 +913,4 @@
>> > #define FL_OFDLCK 1024 /* lock is "owned" by struct file */
>> > #define FL_LAYOUT 2048 /* outstanding pNFS layout */
>> > +#define FL_DROP_PRIVS 4096 /* lest something weird decides that 2 is OK */
>> >
>> > diff --git a/mm/memory.c b/mm/memory.c
>> > index c387430f06c3..08a77e0cf65f 100644
>> > --- a/mm/memory.c
>> > +++ b/mm/memory.c
>> > @@ -2036,6 +2036,7 @@ static inline int wp_page_reuse(struct mm_struct *mm,
>> >
>> > if (!page_mkwrite)
>> > file_update_time(vma->vm_file);
>> > + vma->vm_file->f_flags |= FL_DROP_PRIVS;
>> > }
>> >
>> > return VM_FAULT_WRITE;
>> >
>> > Willy
>> >
>>
>> Is f_flags safe to write like this without holding a lock?
>
> Unfortunately I have no idea. I've seen places where it's written without
> taking a lock such as in blkdev_open() and I don't think that this one is
> called with a lock held.
>
> The comment in fs.h says that spinlock f_lock is here to protect f_flags
> (among others) and that it must not be taken from IRQ context. Thus I'd
> think we "just" have to take it to remain safe. That would be just one
> spinlock per first write via mmap() to a file, I don't know if that's
> reasonable or not :-/

Al, what's the best way forward here? I created a separate flag
variable so it could be used effectively write-only, with the read
happening only at final fput.

-Kees

--
Kees Cook
Chrome OS & Brillo Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/