Re: [RFC][CFT][PATCHSET v1] uaccess unification

From: Vineet Gupta
Date: Wed Mar 29 2017 - 17:16:32 EST


On 03/29/2017 01:29 PM, Al Viro wrote:
> On Wed, Mar 29, 2017 at 01:08:12PM -0700, Vineet Gupta wrote:
>
>> Hi Al,
>>
>> Thx for taking this up. It seems ARC was missing INLINE_COPY* switch likely due to
>> existing 2 variants (inline/out-of-line) we already have.
>> I've added a patch for that (attached too) - boot tested the series on ARC.
>
> BTW, I wonder if inlining all of the copy_{to,from}_user() is actually a win.

Just to be clear, your series was doing this for everyone.

> It's probably arch-dependent and it would be nice if somebody compared
> performance with and without inlining those... ARC, in particular, has
> __arc_copy_{to,from}_user() inlining a whole lot, even in case of non-constant
> size and your patch, AFAICS, will inline all of it in *all* cases.

Yes we do inline all of it: the non-constant case is actually simpler, it is a
simple byte loop.

" mov.f lp_count, %0 \n"
" lpnz 3f \n"
" ldb.ab %1, [%3, 1] \n"
"1: stb.ab %1, [%2, 1] \n"
" sub %0, %0, 1 \n"

Doing it out of line (3 args) will be 4 instructions anyways.

For constant size, there's laddered copy for blocks of 16 bytes + stragglers 1-15.
We do "manual" constant propagation there to compile time optimize away the
straggler part. But yes all of this is emitted inline.


> It might
> end up being a win, but that's not apriori obvious... Do you have any
> profiling results in that area?

Unfortunately not at the moment. The reason for adding out-of-line variant was not
so much as performance but to improve the footprint for -Os case (some customer I
think).