Re: Commit 0be0ee71 ("fs: properly and reliably lock f_pos in fdget_pos()") breaking userspace

From: Kirill Smelkov
Date: Tue Nov 26 2019 - 05:15:23 EST


On Mon, Nov 25, 2019 at 07:39:34PM -0800, Linus Torvalds wrote:
> On Mon, Nov 25, 2019 at 7:21 PM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Of course, this may fix the f_pos locking issue, but replace it with a
> > "oops, the character device driver tried to look at *pos anyway", and
> > that will give you a nice OOPS instead.
>
> Confirmed. At least the x86 firmware update code uses
> "simple_read_from_buffer()", which does use the file position, but
> doesn't actually allow llseek().
>
> So no, "it's a character device no llseek" does not mean that it acts
> as a pure streaming device with no file position, and we'd actually
> have to mark individual drivers (either by adding 'stream_open()' in
> their open routines, or adding the extra field to 'struct
> file_operations' that I mentioned).
>
> I think I'll have to revert that trial commit. I'll give it another
> day in case somebody has a better idea, but it looks like it's too
> early to do that nice cleanup as things are now.
>
> Linus

Linus, thanks for heads up and for pushing FMODE_STREAM conversion
forward.

I still believe the most practical approach to do it is via automation -
via extending and teaching stream_open.cocci to find other places that
actually don't use ppos / ki->ki_pos at all and at least warn that
stream_open() should be used for that file. Else it will be a lot of
pain.

Of course we can manually convert the core cases like pipes and sockets.
But those things are easy to do. On the other hand by manually
converting them, we lesser the extent to where the automation is applied
and tested, and that leaves us with just many small corner cases that
will be left in potentially racy/deadlocked state until someone actually
hit that problem and cares to debug it to understand what is going on.
Those will be many different places and so likely many different people,
and since debugging concurrency issues is not easy, it will likely last for
many years. Of course things like KCSAN helps a lot, but, if I
understand correctly, they do not provide full coverage for the whole
kernel, especially they are likely not providing coverage for leaf
kernel bits in drivers.

Of course, I might be missing something, and there is a way to e.g.
combine automation+pain approaches. Then that would be good to make
combined progress.

I still keep in mind extending stream_open.cocci myself, but for now I
cannot delve into it before finishing my transactional filesystem...

Kirill


P.S. even though I was put into cc there, somehow I did not received any
notification email for commits d8e464ecc17b (vfs: mark pipes and sockets as
stream-like file descriptors) and 0be0ee71816b (vfs: properly and
reliably lock f_pos in fdget_pos()).


P.P.S completely untested, but looks sane:

---- 8< ----