Re: [PATCH] fs: use KERNEL_DS instead of get_ds()

From: Christoph Hellwig
Date: Fri Mar 08 2019 - 11:20:22 EST


On Fri, Mar 08, 2019 at 02:23:31PM +0000, Al Viro wrote:
> You do realize that nested pairs of that sort are not all there is?
> Even leaving m68k aside (there the same registers that select
> userland or kernel for that kind of access can be used e.g. for
> writeback control, or to switch to accessing sun3 MMU tables, etc.)

Yes. And the whole point is to keep these uses clear and separate.

> there are
> * temporary switches to USER_DS in things like unaligned
> access handlers, etc., where the kernel is doing emulation of possibly
> userland insns; similar for oops code dumping, etc.
> * use_mm()/unuse_mm() should probably switch to USER_DS and
> back, rather than doing that in callers.
> * switch to USER_DS (and no, it's *not* "USER_DS unless we started
> with KERNEL_DS" - nested counter is no-go here) for perf callbacks.
> * regular non-paired switches to USER_DS: do_exit() and
> flush_old_exec().

And that is probably the close to full list of callers that want
to explicitly enable access to the user address space, and thus
mark the thread as a user thread (and occasionally clear that in e.g.
unuse_mm).

Unless I'm completely missing something our general rule of thumb
should be:

- threads are started with uaccess kernel turned on (count = 1)
- if we execute in userspace we switch to user uaccess (count = 0)
- same for use_mm style threads that want user access
- every current random kernel code override increments the refcount
and drops the reference when done
- force uaccess cases like do_exit or the validation check on
return to userspace force it back to 0.

Initially each 1 > 0 transition (decrement or force) will do
set_fs(USER_DS), each 0 > 1 transition will do set_fs(KERNEL_DS).

Then later architectures can kill the set_fs API, and potentially
optimize things by getting rid of the addr_limit field in its current
form.