* Peter Zijlstra<peterz@xxxxxxxxxxxxx> wrote:As said by Peter, I haven't reviewed his change yet. The patch I am working on has an optimization that is similar to PeterZ's small NR_CPUS change. Except that I do a single atomic short integer write to switch the bits instead of 2 byte write. However, this code seems to have some problem working with the lockref code and I had panic happening in fs/dcache.c. So I am investigating that issue.
Hi Waiman,Waiman, you indicated in the other thread that these look good to you,
I promised you this series a number of days ago; sorry for the delay
I've been somewhat unwell :/
That said, these few patches start with a (hopefully) simple and
correct form of the queue spinlock, and then gradually build upon
it, explaining each optimization as we go.
Having these optimizations as separate patches helps twofold;
firstly it makes one aware of which exact optimizations were done,
and secondly it allows one to proove or disprove any one step;
seeing how they should be mostly identity transforms.
The resulting code is near to what you posted I think; however it
has one atomic op less in the pending wait-acquire case for NR_CPUS
!= huge. It also doesn't do lock stealing; its still perfectly fair
afaict.
Have I missed any tricks from your code?
right? If so then I can queue them up so that they form a base for
further work.
It would be nice to have per patch performance measurements though ...
this split-up structure really enables that rather nicely.
Thanks,
Ingo