Hi Christophe,
Unfortunately this would collide messily with "uaccess: remove
segment_eq" in linux-next, so I'll ask you to do a respin based on that,
some comments below.
Christophe Leroy <christophe.leroy@xxxxxx> writes:
On powerpc, we only have USER_DS and KERNEL_DS
Today, this is managed as an 'unsigned long' data space limit
which is used to compare the passed address with, plus a bit
in the thread_info flags that is set whenever modifying the limit
to enable the verification in addr_limit_user_check()
The limit is either the last address of user space when USER_DS is
set, and the last address of address space when KERNEL_DS is set.
In both cases, the limit is a compiletime constant.
get_fs() returns the limit, which is part of thread_info struct
set_fs() updates the limit then set the TI_FSCHECK flag.
addr_limit_user_check() check the flag, and if it is set it checks
the limit is the user limit, then unsets the TI_FSCHECK flag.
In addition, when the flag is set the syscall exit work is involved.
This exit work is heavy compared to normal syscall exit as it goes
through normal exception exit instead of the fast syscall exit.
Rename this TI_FSCHECK flag to TIF_KERNEL_DS flag which tells whether
KERNEL_DS or USER_DS is set. Get mm_segment_t be redifined as a bool
struct that is either false (for USER_DS) or true (for KERNEL_DS).
When TIF_KERNEL_DS is set, the limit is ~0UL. Otherwise it is
TASK_SIZE_USER (resp TASK_SIZE_USER64 on PPC64). When KERNEL_DS is
set, there is no range to check. Define TI_FSCHECK as an alias to
TIF_KERNEL_DS.
I'd rather avoid the "DS" name any more than we have to. Maybe it means
"data space" but that's not a very common term.
The generic helper these days is called uaccess_kernel(), which returns
true when uaccess routines are allowed to access the kernel.
So calling it TIF_UACCESS_KERNEL would work I think?
The bool could be called uaccess_kernel.
And END_OF_USER_DS could be USER_ADDR_MAX.
On exit, involve exit work when the bit is set, i.e. when KERNEL_DS
is set. addr_limit_user_check() will clear the bit and kill the
user process.
I guess this is safe. The check was added to make sure we never return
to userspace with KERNEL_DS set, but using the actual TIF flag to
determine the address limit should be equally safe, and avoid the
overhead of the check in the good case.