That's the reason why I decided to measure the real latency, from call_rcu() to the final callback. It includes the delays for waiting until the current grace period completes, until the softirq is scheduled, etc.
Attached is a hack that I use right now for myself.
Btw - on my 4-cpu system, the average latency from call_rcu() to the rcu callback is 4-5 milliseconds, (CONFIG_HZ_1000).
Hmmm... I would expect that if you have some CPUs in dyntick idle mode.
But if I run treercu on an CONFIG_HZ_250 8-CPU Power box, I see 2.5
jiffies per grace period if CPUs are kept out of dyntick idle mode, and
4 jiffies per grace period if CPUs are allowed to enter dyntick idle mode.
Alternatively, if you were testing with multiple concurrent
synchronize_rcu() invocations, you can also see longer grace-period
latencies due to the fact that a new synchronize_rcu() must wait for an
earlier grace period to complete before starting a new one.