[PATCHSET] sanitizing compat nanosleep and other timer-related syscalls

From: Al Viro
Date: Wed Jun 07 2017 - 04:41:20 EST


The series is on top of tip.git#timers/core; the first half
massages {clock_,}nanosleep(), the rest deals with other
timer-related compat syscalls.
As it is, nanosleep() has rather convoluted logics for
copying the timespec to userland. It can happen in the syscall
itself, or in restart callback triggered on restart. Naturally,
there is quite a bit of shared code between those; after all,
restart callbacks mimic what would've been a plain syscall
restart, if not for the need to recalculate timeouts. However,
copying the timespec to userland is *not* a part of shared code -
it's duplicated in sys_nanosleep() and hrtimer_nanosleep_restart(),
and similar for clock_nanosleep(). Moving that copyout into
hrtimer_nanosleep() and its ilk simplifies life.
What's more, that allows to deal with another bit of
nastiness - compat side of nanosleep(2) has to play very sick
games. It calls sys_nanosleep() under set_fs(KERNEL_DS) and
passes it a pointer to on-stack (native) timespec. Then it
converts that to 32bit timespec and copies it to userland;
so far, so good, but if we are going to hit a restart, we can't
leave the restart callback (and arguments for it) as-is -
after all, the pointer to "userland" timespec actually points
to kernel stack frame, long gone by the time we get to restart.
So we flip the restart callback to one of our own and stash
the real userland pointer for it. When that callback is finally
called, it plays with restart args again and calls the native
callback under KERNEL_DS, followed by the same dance as in
the compat syscall itself. For clock_nanosleep() it's even
more convoluted...
All that mess goes away if we teach hrtimer_nanosleep()
to handle both the native and compat copyout. All we need is
to turn the (userland pointer to native timespec, userland pointer
to compat timespec) pair in restart_block into a tagged union
and add a helper used by hrtimer_nanosleep() (and clock_...
counterparts thereof), doing the actual copyout. Massage to
get there is longer than I would like, but the code is convoluted
enough to make doing that in a single step too scary.
The second half is a plain and simple "move compat syscall
towards the native one, get rid of set_fs() by doing what the
native one would with different copyin/copyout" stuff; that avoids
double copying and set_fs() games in cases where we used to play
those and allows to make the guts static in cases when we didn't.
Either way, compat syscalls are better off next to the native
ones.

Please, review. The patches will go in followups to this
mail; for those who prefer to use git tree, it is visible in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git#timers-compat
(the first half - in #timers-nanosleep in the same repo).

Al Viro (16):
move copyout of timespec into do_cpu_nanosleep()
move copyout and freeze handling into alarmtimer_do_nsleep()
hrtimer_nanosleep(): pass rmtp in restart_block
move copyout to do_nanosleel()
clock_nanosleep(): stash rmtp into restart_block
nanosleep/clock_nanosleep: teach to do compat copyouts
{clock_,}nanosleep(2): merge timespec copyout logics into a new helper
kill ->nsleep_restart()
move adjtimex-related compat syscalls to native counterparts
take compat timer_settime(2) to native one
take compat timer_gettime(2) to native one
move compat itimer syscalls to native ones
clock_gettime/clock_settime/clock_getres: move to native syscalls
timer_create(): move compat to native, get rid of set_fs()
time()/stime(): move compat to native
gettimeofday()/settimeofday(): move compat to native