Re: [PATCH] lib/strscpy: remove word-at-a-time optimization.

From: Rasmus Villemoes
Date: Wed Jan 24 2018 - 03:54:19 EST


On 2018-01-09 17:47, Andrey Ryabinin wrote:
> Attached user space program I used to see the difference.
> Usage:
> gcc -02 -o strscpy strscpy_test.c
> ./strscpy {b|w} src_str_len count
>
> src_str_len - length of source string in between 1-4096
> count - how many strscpy() to execute.
>
> Also I've noticed something strange. I'm not sure why, but certain
> src_len values (e.g. 30) drives branch predictor crazy causing worse than usual results
> for byte-at-a-time copy:

I see something similar, but at the 30->31 transition, and the
branch-misses remain at 1-3% for higher values, until 42 where it drops
back to 0%. Anyway, I highly doubt we do a lot of string copies of
strings longer then 32.

$ perf stat ./strscpy_test b 30 10000000

Performance counter stats for './strscpy_test b 30 10000000':

156,777082 task-clock (msec) # 0,999 CPUs
utilized
0 context-switches # 0,000 K/sec

0 cpu-migrations # 0,000 K/sec

48 page-faults # 0,306 K/sec

584.646.177 cycles # 3,729 GHz

<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
2.580.599.614 instructions # 4,41 insns per
cycle
660.114.283 branches # 4210,528 M/sec

4.891 branch-misses # 0,00% of all
branches

0,156970910 seconds time elapsed

$ perf stat ./strscpy_test b 31 10000000

Performance counter stats for './strscpy_test b 31 10000000':

258,533250 task-clock (msec) # 0,999 CPUs
utilized
0 context-switches # 0,000 K/sec

0 cpu-migrations # 0,000 K/sec

50 page-faults # 0,193 K/sec

965.505.138 cycles # 3,735 GHz

<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
2.660.773.463 instructions # 2,76 insns per
cycle
680.141.051 branches # 2630,768 M/sec

19.150.367 branch-misses # 2,82% of all
branches

0,258725192 seconds time elapsed


Rasmus