Re: CONFIG_NO_HZ_FULL + CONFIG_PREEMPT_RT_FULL = nogo
From: Mike Galbraith
Date: Thu Nov 07 2013 - 08:07:41 EST
On Thu, 2013-11-07 at 12:21 +0100, Thomas Gleixner wrote:
> Mike,
>
> On Thu, 7 Nov 2013, Mike Galbraith wrote:
>
> > On Thu, 2013-11-07 at 04:26 +0100, Mike Galbraith wrote:
> > > On Wed, 2013-11-06 at 18:49 +0100, Thomas Gleixner wrote:
> >
> > > > I bet you are trying to work around some of the side effects of the
> > > > occasional tick which is still necessary despite of full nohz, right?
> > >
> > > Nope, I wanted to check out cost of nohz_full for rt, and found that it
> > > doesn't work at all instead, looked, and found that the sole running
> > > task has just awakened ksoftirqd when it wants to shut the tick down, so
> > > that shutdown never happens.
> >
> > Like so in virgin 3.10-rt. Box is x3550 M3 booted nowatchdog
> > rcu_nocbs=1-3 nohz_full=1-3, and CPUs1-3 are completely isolated via
> > cpusets as well.
>
> well, that very same problem is in mainline if you add "threadirqs" to
> the command line. But we can be smart about this. The untested patch
> below should address that issue. If that works on mainline we can
> adapt it for RT (needs a trylock(&base->lock) there).
Oops, in haste I wedged it straight into 3.10-rt as is. First pert
attempt was a bit weird, but it eventually worked.
rtbox:/sys/kernel/debug/tracing # !cgexec
cgexec -g cpuset:rtcpus taskset -c 3 pert 5
2400.01 MHZ CPU
perturbation threshold 0.018 usecs.
pert/s: 807 >8.52us: 2 min: 0.04 max: 10.80 avg: 5.56 sum/s: 4485us overhead: 0.45%
pert/s: 707 >8.54us: 4 min: 2.85 max: 11.78 avg: 5.63 sum/s: 3981us overhead: 0.40%
pert/s: 807 >8.51us: 2 min: 0.04 max: 10.86 avg: 5.58 sum/s: 4502us overhead: 0.45%
pert/s: 707 >8.48us: 3 min: 0.04 max: 10.82 avg: 5.59 sum/s: 3959us overhead: 0.40%
pert/s: 630 >8.73us: 5 min: 0.04 max: 16.65 avg: 5.29 sum/s: 3335us overhead: 0.33%
pert/s: 152 >9.50us: 4 min: 0.04 max: 32.58 avg: 0.37 sum/s: 56us overhead: 0.01%
pert/s: 28 >9.74us: 3 min: 0.04 max: 22.31 avg: 1.41 sum/s: 40us overhead: 0.00%
pert/s: 8 >10.02us: 4 min: 1.75 max: 20.56 avg: 4.54 sum/s: 36us overhead: 0.00%
pert/s: 7 >10.23us: 3 min: 1.82 max: 19.94 avg: 4.33 sum/s: 34us overhead: 0.00%
pert/s: 9 >10.45us: 5 min: 0.04 max: 20.79 avg: 4.11 sum/s: 38us overhead: 0.00%
pert/s: 31 >10.57us: 5 min: 0.04 max: 22.13 avg: 1.22 sum/s: 38us overhead: 0.00%
pert/s: 10 >10.77us: 5 min: 0.04 max: 21.40 avg: 3.68 sum/s: 38us overhead: 0.00%
^C
rtbox:/sys/kernel/debug/tracing # cgexec -g cpuset:rtcpus taskset -c 3 pert 5
2400.02 MHZ CPU
perturbation threshold 0.018 usecs.
pert/s: 8 >14.06us: 2 min: 1.70 max: 19.66 avg: 4.24 sum/s: 35us overhead: 0.00%
pert/s: 8 >13.97us: 3 min: 1.80 max: 21.81 avg: 4.48 sum/s: 37us overhead: 0.00%
pert/s: 8 >13.77us: 2 min: 1.77 max: 19.64 avg: 4.35 sum/s: 35us overhead: 0.00%
pert/s: 9 >13.72us: 3 min: 0.04 max: 22.03 avg: 4.35 sum/s: 39us overhead: 0.00%
pert/s: 8 >13.55us: 2 min: 1.75 max: 19.88 avg: 4.16 sum/s: 35us overhead: 0.00%
pert/s: 8 >13.43us: 3 min: 0.04 max: 20.55 avg: 4.21 sum/s: 36us overhead: 0.00%
pert/s: 8 >13.28us: 2 min: 1.74 max: 19.53 avg: 4.34 sum/s: 35us overhead: 0.00%
pert/s: 8 >13.22us: 3 min: 1.76 max: 20.96 avg: 4.35 sum/s: 37us overhead: 0.00%
pert/s: 8 >13.10us: 2 min: 1.72 max: 19.64 avg: 4.38 sum/s: 36us overhead: 0.00%
^C
rtbox:/sys/kernel/debug/tracing # cgexec -g cpuset:rtcpus taskset -c 3 pert 5
2400.03 MHZ CPU
perturbation threshold 0.018 usecs.
pert/s: 9 >14.55us: 2 min: 0.04 max: 20.93 avg: 4.11 sum/s: 37us overhead: 0.00%
pert/s: 8 >14.36us: 3 min: 1.72 max: 20.75 avg: 4.42 sum/s: 36us overhead: 0.00%
pert/s: 8 >14.14us: 2 min: 1.74 max: 20.02 avg: 4.28 sum/s: 35us overhead: 0.00%
pert/s: 8 >13.98us: 3 min: 1.77 max: 20.54 avg: 4.51 sum/s: 36us overhead: 0.00%
pert/s: 8 >13.76us: 2 min: 1.72 max: 19.57 avg: 4.17 sum/s: 35us overhead: 0.00%
pert/s: 8 >13.63us: 3 min: 1.79 max: 20.42 avg: 4.38 sum/s: 36us overhead: 0.00%
pert/s: 9 >13.51us: 2 min: 0.04 max: 20.78 avg: 4.09 sum/s: 37us overhead: 0.00%
> What worries me more is this one:
>
> pert-5229 [003] d..h1.. 684.482618: softirq_raise: vec=9 [action=RCU]
>
> The CPU has no callbacks as you shoved them over to cpu 0, so why is
> the RCU softirq raised?
Dunno, but it's repeatable. Workqueues are perturbation sources too,
update_vmstat, drain_caches (or such, didn't save all traces).
-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/