* Waiman Long<waiman.long@xxxxxx> wrote:
Thanks, that's interesting!Mind posting the microbenchmark?I have attached the tool that I used for testing.
Btw., we could also do something like this in user-space, in tools/perf/bench/, we
have no 'perf bench locking' subcommand yet.
We already build and measure simple x86 kernel methods there such as memset() and
memcpy():
triton:~/tip> perf bench mem memcpy -r all
# Running 'mem/memcpy' benchmark:
Routine default (Default memcpy() provided by glibc)
# Copying 1MB Bytes ...
1.385195 GB/Sec
4.982462 GB/Sec (with prefault)
Routine x86-64-unrolled (unrolled memcpy() in arch/x86/lib/memcpy_64.S)
# Copying 1MB Bytes ...
1.627604 GB/Sec
5.336407 GB/Sec (with prefault)
Routine x86-64-movsq (movsq-based memcpy() in arch/x86/lib/memcpy_64.S)
# Copying 1MB Bytes ...
2.132233 GB/Sec
4.264465 GB/Sec (with prefault)
Routine x86-64-movsb (movsb-based memcpy() in arch/x86/lib/memcpy_64.S)
# Copying 1MB Bytes ...
1.490935 GB/Sec
7.128193 GB/Sec (with prefault)
Locking primitives would certainly be more complex build in user-space - but we
could shuffle things around in kernel headers as well to make it easier to test in
user-space.
That's how we can build lockdep in user-space for example, see tools/lib/lockdep.
Just a thought.
Thanks,
Ingo