Re: [git pull] uaccess-related bits of vfs.git
From: Al Viro
Date: Sat May 13 2017 - 12:46:23 EST
On Sat, May 13, 2017 at 02:05:27PM +0200, Adam Borowski wrote:
> As someone from the peanuts gallery, I took a look for __put_user() in my
> usual haunt, drivers/tty/vt/
>
> * use 1: con_[gs]et_trans_*():
> Copies a linear array of 256 bytes/shorts, one by one.
> The obvious patch has 9 insertions(+), 22 deletions(-).
>
> * use 2: con_[gs]et_unimap():
> Ditto, up to 65535*2 shorts, also in a nice linear array.
>
> * use 3: tioclinux():
> Does a __put into a place that was checked only for read. This got me
> frightened as it initially looked like something that can allow an user to
> write where they shouldn't. Fortunately, it turns out the first argument
> to access_ok() is ignored on every single architecture -- why does it even
> exist then? I imagined it's there for some odd arch that allows writes
> when in privileged mode, but unlike what the docs say, VERIFY_WRITE is
> exactly same as VERIFY_READ.
It's a remnant of old kludge that never properly worked in the first place.
access_ok() should have been called userland_range() - that's all it
checks and that's all it *can* check.
As it is, each of those __get_user() can bloody well yield -EFAULT. Despite
having "passed" access_ok(). Again, the only thing access_ok() checks is
that (on architecture with shared address space for userland and kernel)
the addresses given are on the userland side. That's _it_ - they can
be unmapped, mmapped to broken floppy, whatever; you'll find out when
you try to actually copy bytes from from it.
What the kludge used to attempt was "let's check that we are not trying
to copy into read-only mmapped area - 80386 MMU is fucked in head and won't
fault on such stores in ring 0". It had always been racy. Look:
thread A: going to copy something to user-supplied address. Do access_ok().
thread A: looks like it's mapped writable, let's go ahead and copy
thread A: do something blocking before actually doing __put_user() or
__copy_to_user() or whatever it's going to be.
thread B: munmap(), mmap() something read-only here.
thread A: get to actual __put_user()/__copy_to_user().
80386: hey, it's ring 0, no need to look at the write protect bit in page
tables. What can go wrong, anyway?
You can't move any non-static checks to access_ok(). On any architecture.
Anything that could change between access_ok() and actual copying can't
be checked in access_ok(). As it is, access_ok() has actively misleading
calling conventions:
* the name implies that having passed access_ok() you don't have
to worry about EFAULT
* 'direction' argument of that thing reinforces that impression
*and* has to be produced by the caller. Most simply pass a constant,
which immediately gets dropped (as an aside, take a look at 4b4554f6d - it's
amusing), but in some cases it's calculated elsewhere and carefully passed
through several levels of call chain. Only to be discarded by access_ok(),
of course...
> Ie, every use in this sample is wrong. I suspect the rest of the kernel
> should be similar.
Looking through vt...
* con_set_trans_old(): copy_from_user() + loop for doing or with
UNI_DIRECT_BASE. Almost certainly will be faster that way - on *any*
architecture.
* con_get_trans_old(): copy_to_user() would be an obvious optimization.
* con_set_trans_new(): copy_from_user().
* con_get_trans_new(): copy_to_user().
* con_set_unimap(): memdup_user() instead of the entire kmalloc_array +
loop copying the sucker member by member. With ushort ct you don't need
overflow-related logics of kmalloc_array() anyway...
* con_get_unimap(): copy_to_user() + put_user() (for uct)
* set_selection(): just copy_from_user() into local struct tiocl_selection.
* tioclinux(): use put_user(). Yes, you will repeat the same check twice.
Once per ioctl(), that involves enough work to make that "recalculate
access_ok() once for no reason" non-issue on any architecture.
* vt_ioctl(): turn that ushort ll,cc,vlin,clin,vcol,ccol; into
struct vt_consize size; or something like that and use copy_from_user()
And AFAICS you can lose each and every access_ok() call in there.