Re: [PATCH 1/2] vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files

From: Kirill Smelkov
Date: Sat Apr 13 2019 - 14:59:19 EST


On Sat, Apr 13, 2019 at 10:27:00AM -0700, Linus Torvalds wrote:
> On Sat, Apr 13, 2019 at 9:55 AM Kirill Smelkov <kirr@xxxxxxxxxx> wrote:
> >
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -371,7 +371,7 @@ int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t
> > inode = file_inode(file);
> > if (unlikely((ssize_t) count < 0))
> > return retval;
> > - pos = *ppos;
> > + pos = (ppos ? *ppos : 0);
> > if (unlikely(pos < 0)) {
> > if (!unsigned_offsets(file))
> > return retval;
>
> This part looks silly. We should just avoid all the position overflow
> games when we don't have a position at all (ie streaming). You can't
> overflow what you don't use.
>
> Similarly, you can't use ranged mandatory locking on a stream, so the
> mandatory locking thing seems dependent on pos too.
>
> So I think that with a NULL ppos being possible, we should just change
> the code to just do all of that conditionally on having a position,
> rather than saying that the position of a stream is always 0.

Linus, thanks for feedback. After checking a bit of history of
rw_verify_area and seeing what it does I agree with you. I was initially
confused because rw_verify_area calls security_file_permission() at its
tail and so I thought we cannot skip rw_verify_area for !ppos case.
After studying a bit I see now that there are several things going on
and indeed moving ranged locks checking into under `if ppos!=NULL` makes
sense. This code is new to me; apologize for not getting it right
initially.

> That said, this whole "let's make it possible to not have a position
> at all" is a big change, and there's no way I'll apply these before
> the 5.2 merge window.

Ok.

> And I'd really like to have people (Al?) look at this and go "yeah,
> makes sense". I do think that moving to a model where we wither have a
> (properly locked) file position or no pos pointer at all is the right
> model (ie I'd really like to get rid of the mixed case), but there
> might be some practical problem that makes it impractical.
>
> Because the *real* problem with the mixed case is not "insane people
> who do bad things might get odd results". No, the real problem with
> the mixed case is that it could be a security issue (ie: one process
> intentionally changes the file position just as another process is
> going a 'read' and then avoids some limit test because the limit test
> was done using the old 'pos' value but the actual IO was done using
> the new one).
>
> So I suspect that we will have to either
>
> - get rid of the mixed case entirely (and do only properly locked
> f_pos accesses or pass is a NULL f_pos)
>
> - continue to support the mixed case, but also continue to support
> the nasty temporary 'pos' value with that file_pos_{read,write}()
> thing.
>
> IOW, I would not be ok with passing in a shared - and unlocked -
> &file->f_pos value to random drivers that *might* do odd things when a
> race happens.

I see. That's why I splitted the whole change into 2: the first patch
only adds ppos=NULL semantic for FMODE_STREAM and does not change to
passing &file->f_pos directly. There are signs that people can be
starting to use stream_open even now - before we do mass conversion.
That's why it is better to fix ppos semantic for stream case early.
You said it won't happen before next merge window - ok, but ppos=NULL
change is separate and does not depend on passing &file->f_pos directly.
So let's please apply ppos=NULL patch early when merge window begins and
independently of what happens with whether we'll switch to direct usage
of &file->f_pos.

Looking forward to get review from others too.

And I appreciate if people could help at least somehow with "getting rid
of mixed case entirely" (i.e. always lock f_pos_lock on !FMODE_STREAM),
because this transition starts to diverge from my particular use-case
too far. To me it makes sense to do that transition as follows:

- convert nonseekable_open -> stream_open via stream_open.cocci;
- audit other nonseekable_open calls and convert left users that truly
don't depend on position to stream_open;
- extend stream_open.cocci to analyze alloc_file_pseudo as well (this
will cover pipes and sockets), or maybe convert pipes and sockets to
FMODE_STREAM manually;
- extend stream_open.cocci to analyze file_operations that use no_llseek
or noop_llseek, but do not use nonseekable_open or alloc_file_pseudo.
This might find files that have stream semantic but are opened
differently;
- extend stream_open.cocci to analyze file_operations whose .read/.write
do not use ppos at all (independently of how file was opened);
- ...
- after that remove FMODE_ATOMIC_POS and always take f_pos_lock if
!FMODE_STREAM;
- gather bug reports for deadlocked read/write and convert missed cases
to FMODE_STREAM, probably extending stream_open.cocci along the road
to catch similar cases.

i.e. always take f_pos_lock unless a file is explicitly marked as being
stream, and try to find and cover all files that are streams.

Updated patch for ppos=NULL for stream case with rw_verify_area change
fixed is below.

Kirill

---- 8<----