interesting. Could you try two things? Firstly, could you add some minimal delays to the lock/unlock path, of at least 1 usec? E.g. "synchro-test.ko load=1 interval=1". [but you could try longer delays too, 10 usecs is still realistic.]
secondly, could you try the VFS creat+unlink test via the test-mutex.c code below, with something like:
./test-mutex V 16 10
thirdly, could you run 'vmstat 1' during the tests, and post those lines too? Here i'm curious about two things: the average runqueue length (whether we have overscheduling), and CPU utilization and idle time left (how efficiently cycles are preserved in contention). [btw., does ppc have an idle=poll equivalent mode of idling?]
also, there seems to be some fluctuation in the numbers - could you try to run a few more to see how stable the numbers are?