Re: [PATCH v3 8/9] x86: use __uaccess_begin_nospec and ASM_IFENCE in get_user paths

From: Al Viro
Date: Thu Jan 18 2018 - 13:12:37 EST


On Thu, Jan 18, 2018 at 08:49:31AM -0800, Linus Torvalds wrote:
> On Thu, Jan 18, 2018 at 8:38 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> >
> > > But there are about ~100 set_fs() calls in generic code, and some of
> > > those really are pretty fundamental. Doing things like "kernel_read()"
> > > without set_fs() is basically impossible.
> >
> > Not if we move to iov_iter or iov_iter-like behavior for all reads
> > and writes.
>
> Not going to happen. Really. We have how many tens of thousands of
> drivers again, all doing "copy_to_user()".

The real PITA is not even that (we could provide helpers making
conversion from ->read() to ->read_iter() easy for char devices,
etc.). It's the semantics of readv(2). Consider e.g. readv()
from /dev/rtc, with iovec array consisting of 10 segments, each
int-sized. Right now we'll get rtc_dev_read() called in a loop,
once for each segment. Single read() into 40-byte buffer will
fill one long and bugger off. Converting it to ->read_iter()
will mean more than just "use copy_to_iter() instead of put_user()" -
that would be trivial. But to preserve the current behaviour
we would need something like
total = 0;
while (iov_iter_count(to)) {
count = iov_iter_single_seg_count(to);
/* current body of rtc_dev_read(), with
* put_user() replaced with copy_to_iter()
*/
....
if (res < 0) {
if (!total)
total = res;
break;
}
total += res;
if (res != count)
break;
}
return total;
in that thing. And similar boilerplates would be needed in
a whole lot of drivers. Sure, they are individually trivial,
but they would add up to shitloads of code to get wrong.

These are basically all ->read() instances that ignore *ppos
and, unlike pipes, do not attempt to fill as much of the
buffer as possible. We do have quite a few of such.

Some ->read() instances can be easily converted to ->read_iter()
and will, in fact, be better off that way. We had patches of
that sort and I'm certain that we still have such places left.
Ditto for ->write() and ->write_iter(). But those are not
even close to being the majority. Sorry.

We could, in principle, do something like

dev_rtc_read_iter(iocb, to)
{
return loop_read_iter(iocb, to, modified_dev_rtc_read);
}
with modified_dev_rtc_read() being the result of minimal
conversion (put_user() and copy_to_user() replaced with used
of copy_to_iter()). It would be less boilerplate that way,
but I really don't see big benefits from doing that.

On the write side the things are just as unpleasant - we have
a lot of ->write() instances that parse the beginning of the
buffer, ignore the rest and report that everything got written.
writev() on those will parse each iovec segment, ignoring the
junk in the end of each. Again, that loop needs to go somewhere.
And we do have a bunch of "parse the buffer and do some action
once" ->write() instances - in char devices, debugfs, etc.