[...]
Cycles added for no good reason is the point.
Hi Santosh,
Yes, I agree with accuracy of s_send_lock_queue_raced. But the main point is that the existing code counts some partial share of when it is _not_ raced.
So, in the critical path, my patch adds one test_bit(), which hits the local CPU cache, if not raced. If raced, some other thread is in control, so I would not think the added cycles would make any big difference.
I can send a v2 where the race tightening is removed if you like.Yes please.