Re: Unusually high system CPU usage with recent kernels

From: Tibor Billes
Date: Sun Aug 25 2013 - 15:50:31 EST


From: Paul E. McKenney Sent: 08/24/13 11:03 PM
> On Sat, Aug 24, 2013 at 09:59:45PM +0200, Tibor Billes wrote:
> > From: Paul E. McKenney Sent: 08/22/13 12:09 AM
> > > On Wed, Aug 21, 2013 at 11:05:51PM +0200, Tibor Billes wrote:
> > > > > From: Paul E. McKenney Sent: 08/21/13 09:12 PM
> > > > > On Wed, Aug 21, 2013 at 08:14:46PM +0200, Tibor Billes wrote:
> > > > > > > From: Paul E. McKenney Sent: 08/20/13 11:43 PM
> > > > > > > On Tue, Aug 20, 2013 at 10:52:26PM +0200, Tibor Billes wrote:
> > > > > > > > > From: Paul E. McKenney Sent: 08/20/13 04:53 PM
> > > > > > > > > On Tue, Aug 20, 2013 at 08:01:28AM +0200, Tibor Billes wrote:
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I was using the 3.9.7 stable release and tried to upgrade to the 3.10.x series.
> > > > > > > > > > The 3.10.x series was showing unusually high (>75%) system CPU usage in some
> > > > > > > > > > situations, making things really slow. The latest stable I tried is 3.10.7.
> > > > > > > > > > I also tried 3.11-rc5, they both show this behaviour. This behaviour doesn't
> > > > > > > > > > show up when the system is idling, only when doing some CPU intensive work,
> > > > > > > > > > like compiling with multiple threads. Compiling with only one thread seems not
> > > > > > > > > > to trigger this behaviour.
> > > > > > > > > >
> > > > > > > > > > To be more precise I did a `perf record -a` while compiling a large C++ program
> > > > > > > > > > with scons using 4 threads, the result is appended at the end of this email.
> > > > > > > > >
> > > > > > > > > New one on me! You are running a mainstream system (x86_64), so I am
> > > > > > > > > surprised no one else noticed.
> > > > > > > > >
> > > > > > > > > Could you please send along your .config file?
> > > > > > > >
> > > > > > > > Here it is
> > > > > > >
> > > > > > > Interesting. I don't see RCU stuff all that high on the list, but
> > > > > > > the items I do see lead me to suspect RCU_FAST_NO_HZ, which has some
> > > > > > > relevance to the otherwise inexplicable group of commits you located
> > > > > > > with your bisection. Could you please rerun with CONFIG_RCU_FAST_NO_HZ=n?
> > > > > > >
> > > > > > > If that helps, there are some things I could try.
> > > > > >
> > > > > > It did help. I didn't notice anything unusual when running with CONFIG_RCU_FAST_NO_HZ=n.
> > > > >
> > > > > Interesting. Thank you for trying this -- and we at least have a
> > > > > short-term workaround for this problem. I will put a patch together
> > > > > for further investigation.
> > > >
> > > > I don't specifically need this config option so I'm fine without it in
> > > > the long term, but I guess it's not supposed to behave like that.
> > >
> > > OK, good, we have a long-term workload for your specific case,
> > > even better. ;-)
> > >
> > > But yes, there are situations where RCU_FAST_NO_HZ needs to work
> > > a bit better. I hope you will bear with me with a bit more
> > > testing...
> > >
> > > > > In the meantime, could you please tell me how you were measuring
> > > > > performance for your kernel builds? Wall-clock time required to complete
> > > > > one build? Number of builds completed per unit time? Something else?
> > > >
> > > > Actually, I wasn't all this sophisticated. I have a system monitor
> > > > applet on my top panel (using MATE, Linux Mint), four little graphs,
> > > > one of which shows CPU usage. Different colors indicate different kind
> > > > of CPU usage. Blue shows user space usage, red shows system usage, and
> > > > two more for nice and iowait. During a normal compile it's almost
> > > > completely filled with blue user space CPU usage, only the top few
> > > > pixels show some iowait and system usage. With CONFIG_RCU_FAST_NO_HZ
> > > > set, about 3/4 of the graph was red system CPU usage, the rest was
> > > > blue. So I just looked for a pile of red on my graphs when I tested
> > > > different kernel builds. But also compile speed was horrible I couldn't
> > > > wait for the build to finish. Even the UI got unresponsive.
> > >
> > > We have been having problems with CPU accounting, but this one looks
> > > quite real.
> > >
> > > > Now I did some measuring. In the normal case a compile finished in 36
> > > > seconds, compiled 315 object files. Here are some output lines from
> > > > dstat -tasm --vm during compile:
> > > > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
> > > >   time   |usr sys idl wai hiq siq| read Âwrit| recv Âsend| Âin  out | int  csw | used Âfree| used Âbuff Âcach Âfree|majpf minpf alloc Âfree
> > > > 21-08 21:48:05| 91  8  2  0  0  0|  0 Â5852k|  0   0 |  0   0 |1413 Â1772 |  0 Â7934M| 581M 58.0M Â602M 6553M|  0  Â71k  46k  54k
> > > > 21-08 21:48:06| 93  6  1  0  0  0|  0 Â2064k| 137B Â131B|  0   0 |1356 Â1650 |  0 Â7934M| 649M 58.0M Â604M 6483M|  0  Â72k  47k  28k
> > > > 21-08 21:48:07| 86 Â11  4  0  0  0|  0 Â5872k|  0   0 |  0   0 |2000 Â2991 |  0 Â7934M| 577M 58.0M Â627M 6531M|  0  Â99k  67k  79k
> > > > 21-08 21:48:08| 87  9  3  0  0  0|  0 Â2840k|  0   0 |  0   0 |2558 Â4164 |  0 Â7934M| 597M 58.0M Â632M 6507M|  0  Â96k  57k  51k
> > > > 21-08 21:48:09| 93  7  1  0  0  0|  0 Â3032k|  0   0 |  0   0 |1329 Â1512 |  0 Â7934M| 641M 58.0M Â626M 6469M|  0  Â61k  48k  39k
> > > > 21-08 21:48:10| 93  6  0  0  0  0|  0 Â4984k|  0   0 |  0   0 |1160 Â1146 |  0 Â7934M| 572M 58.0M Â628M 6536M|  0  Â50k  40k  57k
> > > > 21-08 21:48:11| 86  9  6  0  0  0|  0 Â2520k|  0   0 |  0   0 |2947 Â4760 |  0 Â7934M| 605M 58.0M Â631M 6500M|  0  103k  55k  45k
> > > > 21-08 21:48:12| 90  8  2  0  0  0|  0 Â2840k|  0   0 |  0   0 |2674 Â4179 |  0 Â7934M| 671M 58.0M Â635M 6431M|  0  Â84k  59k  42k
> > > > 21-08 21:48:13| 90  9  1  0  0  0|  0 Â4656k|  0   0 |  0   0 |1223 Â1410 |  0 Â7934M| 643M 58.0M Â638M 6455M|  0  Â90k  62k  68k
> > > > 21-08 21:48:14| 91  8  1  0  0  0|  0 Â3572k|  0   0 |  0   0 |1432 Â1828 |  0 Â7934M| 647M 58.0M Â641M 6447M|  0  Â81k  59k  57k
> > > > 21-08 21:48:15| 91  8  1  0  0  0|  0 Â5116k| 116B  Â0 |  0   0 |1194 Â1295 |  0 Â7934M| 605M 58.0M Â644M 6487M|  0  Â69k  54k  64k
> > > > 21-08 21:48:16| 87 Â10  3  0  0  0|  0 Â5140k|  0   0 |  0   0 |1761 Â2586 |  0 Â7934M| 584M 58.0M Â650M 6502M|  0  105k  64k  68k
> > > >
> > > > The abnormal case compiled only 182 object file in 6 and a half minutes,
> > > > then I stopped it. The same dstat output for this case:
> > > > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
> > > >   time   |usr sys idl wai hiq siq| read Âwrit| recv Âsend| Âin  out | int  csw | used Âfree| used Âbuff Âcach Âfree|majpf minpf alloc Âfree
> > > > 21-08 22:10:49| 27 Â62 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | 210B Â Â0 | Â 0 Â Â 0 |1414 Â3137k| Â 0 Â7934M| 531M 57.6M Â595M 6611M| Â 0 Â1628 Â1250 Â 322
> > > > 21-08 22:10:50| 25 Â60 Â 4 Â 0 Â11 Â 0| Â 0 Â Â88k| 126B Â Â0 | Â 0 Â Â 0 |1337 Â3110k| Â 0 Â7934M| 531M 57.6M Â595M 6611M| Â 0 Â Â91 Â 128 Â 115
> > > > 21-08 22:10:51| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â 184k| 294B Â Â0 | Â 0 Â Â 0 |1411 Â3147k| Â 0 Â7934M| 531M 57.6M Â595M 6611M| Â 0 Â1485 Â 814 Â 815
> > > > 21-08 22:10:52| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | 437B Â239B| Â 0 Â Â 0 |1355 Â3160k| Â 0 Â7934M| 531M 57.6M Â595M 6611M| Â 0 Â Â24 Â Â94 Â Â97
> > > > 21-08 22:10:53| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | 168B Â Â0 | Â 0 Â Â 0 |1397 Â3155k| Â 0 Â7934M| 531M 57.6M Â595M 6611M| Â 0 Â 479 Â 285 Â 273
> > > > 21-08 22:10:54| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â4096B| 396B Â324B| Â 0 Â Â 0 |1346 Â3154k| Â 0 Â7934M| 531M 57.6M Â595M 6611M| Â 0 Â Â27 Â 145 Â 145
> > > > 21-08 22:10:55| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â60k| Â 0 Â Â 0 | Â 0 Â Â 0 |1353 Â3148k| Â 0 Â7934M| 531M 57.6M Â595M 6610M| Â 0 Â Â93 Â 117 Â Â36
> > > > 21-08 22:10:56| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1341 Â3172k| Â 0 Â7934M| 531M 57.6M Â595M 6610M| Â 0 Â 158 Â Â87 Â Â74
> > > > 21-08 22:10:57| 26 Â62 Â 1 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â42B Â 60B| Â 0 Â Â 0 |1332 Â3162k| Â 0 Â7934M| 531M 57.6M Â595M 6610M| Â 0 Â Â56 Â Â82 Â Â78
> > > > 21-08 22:10:58| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1334 Â3178k| Â 0 Â7934M| 531M 57.6M Â595M 6610M| Â 0 Â Â26 Â Â56 Â Â56
> > > > 21-08 22:10:59| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1336 Â3179k| Â 0 Â7934M| 531M 57.6M Â595M 6610M| Â 0 Â Â 3 Â Â33 Â Â32
> > > > 21-08 22:11:00| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â24k| Â90B Â108B| Â 0 Â Â 0 |1347 Â3172k| Â 0 Â7934M| 531M 57.6M Â595M 6610M| Â 0 Â Â41 Â Â73 Â Â71
> > > >
> > > > I have four logical cores so 25% makes up 1 core. I don't know if the ~26% user CPU usage has anthing to do with this fact or just coincidence. The rest is ~63% system and ~11% hardware interrupt. Do these support what you suspect?
> > >
> > > The massive increase in context switches does come as a bit of a surprise!
> > > It does rule out my initial suspicion of lock contention, but then again
> > > the fact that you have only four CPUs made that pretty unlikely to begin
> > > with.
> > >
> > > 2.4k average context switches in the good case for the full run vs. 3,156k
> > > for about half of a run in the bad case. That is an increase of more
> > > than three orders of magnitude!
> > >
> > > Yow!!!
> > >
> > > Page faults are actually -higher- in the good case. You have about 6.5GB
> > > free in both cases, so you are not running out of memory. Lots more disk
> > > writes in the good case, perhaps consistent with its getting more done.
> > > Networking is negligible in both cases.
> > >
> > > Lots of hardware interrupts in the bad case as well. Would you be willing
> > > to take a look at /proc/interrupts before and after to see which one you
> > > are getting hit with? (Or whatever interrupt tracking tool you prefer.)
> >
> > Here are the results.
> >
> > Good case before:
> > Â Â Â Â Â CPU0 Â Â Â CPU1 Â Â Â CPU2 Â Â Â CPU3 Â Â Â
> > Â0:     17     Â0     Â0     Â0  IO-APIC-edge   Âtimer
> > Â1:    Â356     Â1     68     Â4  IO-APIC-edge   Âi8042
> > Â8:     Â0     Â0     Â1     Â0  IO-APIC-edge   Ârtc0
> > Â9:    Â330     14    Â449     71  IO-APIC-fasteoi  acpi
> > 12:     10    Â108    Â269    2696  IO-APIC-edge   Âi8042
> > 16:     36     10    Â111     Â2  IO-APIC-fasteoi  ehci_hcd:usb1
> > 17:     20     Â3     25     Â4  IO-APIC-fasteoi  mmc0
> > 21:     Â3     Â0     34     Â0  IO-APIC-fasteoi  ehci_hcd:usb2
> > 40:     Â0     Â1     12     11  PCI-MSI-edge   Âmei_me
> > 41:   Â10617    Â173    9959    Â292  PCI-MSI-edge   Âahci
> > 42:    Â862     11    Â186     26  PCI-MSI-edge   Âxhci_hcd
> > 43:    Â107     77     27    Â102  PCI-MSI-edge   Âi915
> > 44:    5322     20    Â434     22  PCI-MSI-edge   Âiwlwifi
> > 45:    Â180     Â0    Â183     86  PCI-MSI-edge   Âsnd_hda_intel
> > 46:     Â0     Â3     Â0     Â0  PCI-MSI-edge   Âeth0
> > NMI: Â Â Â Â Â1 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Non-maskable interrupts
> > LOC: Â Â Â16312 Â Â Â15177 Â Â Â10840 Â Â Â 8995 Â Local timer interrupts
> > SPU: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Spurious interrupts
> > PMI: Â Â Â Â Â1 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Performance monitoring interrupts
> > IWI: Â Â Â 1160 Â Â Â Â523 Â Â Â 1031 Â Â Â Â481 Â IRQ work interrupts
> > RTR: Â Â Â Â Â3 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â APIC ICR read retries
> > RES: Â Â Â14976 Â Â Â16135 Â Â Â 9973 Â Â Â10784 Â Rescheduling interrupts
> > CAL: Â Â Â Â482 Â Â Â Â457 Â Â Â Â151 Â Â Â Â370 Â Function call interrupts
> > TLB: Â Â Â Â 70 Â Â Â Â106 Â Â Â Â352 Â Â Â Â230 Â TLB shootdowns
> > TRM: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Thermal event interrupts
> > THR: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Threshold APIC interrupts
> > MCE: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Machine check exceptions
> > MCP: Â Â Â Â Â2 Â Â Â Â Â2 Â Â Â Â Â2 Â Â Â Â Â2 Â Machine check polls
> > ERR: Â Â Â Â Â0
> > MIS: Â Â Â Â Â0
> >
> > Good case after:
> > Â Â Â Â Â CPU0 Â Â Â CPU1 Â Â Â CPU2 Â Â Â CPU3 Â Â Â
> > Â0:     17     Â0     Â0     Â0  IO-APIC-edge   Âtimer
> > Â1:    Â367     Â1     81     Â4  IO-APIC-edge   Âi8042
> > Â8:     Â0     Â0     Â1     Â0  IO-APIC-edge   Ârtc0
> > Â9:    Â478     14    Â460     71  IO-APIC-fasteoi  acpi
> > 12:     10    Â108    Â269    2696  IO-APIC-edge   Âi8042
> > 16:     36     10    Â111     Â2  IO-APIC-fasteoi  ehci_hcd:usb1
> > 17:     20     Â3     25     Â4  IO-APIC-fasteoi  mmc0
> > 21:     Â3     Â0     34     Â0  IO-APIC-fasteoi  ehci_hcd:usb2
> > 40:     Â0     Â1     12     11  PCI-MSI-edge   Âmei_me
> > 41:   Â16888    Â173    9959    Â292  PCI-MSI-edge   Âahci
> > 42:    1102     11    Â186     26  PCI-MSI-edge   Âxhci_hcd
> > 43:    Â107    Â132     27    Â136  PCI-MSI-edge   Âi915
> > 44:    6943     20    Â434     22  PCI-MSI-edge   Âiwlwifi
> > 45:    Â180     Â0    Â183     86  PCI-MSI-edge   Âsnd_hda_intel
> > 46:     Â0     Â3     Â0     Â0  PCI-MSI-edge   Âeth0
> > NMI: Â Â Â Â Â4 Â Â Â Â Â3 Â Â Â Â Â3 Â Â Â Â Â3 Â Non-maskable interrupts
> > LOC: Â Â Â26845 Â Â Â24780 Â Â Â19025 Â Â Â17746 Â Local timer interrupts
> > SPU: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Spurious interrupts
> > PMI: Â Â Â Â Â4 Â Â Â Â Â3 Â Â Â Â Â3 Â Â Â Â Â3 Â Performance monitoring interrupts
> > IWI: Â Â Â 1637 Â Â Â Â751 Â Â Â 1287 Â Â Â Â695 Â IRQ work interrupts
> > RTR: Â Â Â Â Â3 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â APIC ICR read retries
> > RES: Â Â Â26511 Â Â Â26673 Â Â Â18791 Â Â Â20194 Â Rescheduling interrupts
> > CAL: Â Â Â Â510 Â Â Â Â480 Â Â Â Â151 Â Â Â Â370 Â Function call interrupts
> > TLB: Â Â Â Â361 Â Â Â Â292 Â Â Â Â575 Â Â Â Â461 Â TLB shootdowns
> > TRM: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Thermal event interrupts
> > THR: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Threshold APIC interrupts
> > MCE: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Machine check exceptions
> > MCP: Â Â Â Â Â2 Â Â Â Â Â2 Â Â Â Â Â2 Â Â Â Â Â2 Â Machine check polls
> > ERR: Â Â Â Â Â0
> > MIS: Â Â Â Â Â0
> >
> > Bad case before:
> > Â Â Â Â Â CPU0 Â Â Â CPU1 Â Â Â CPU2 Â Â Â CPU3 Â Â Â
> > Â0:     17     Â0     Â0     Â0  IO-APIC-edge   Âtimer
> > Â1:    Â172     Â3     78     Â3  IO-APIC-edge   Âi8042
> > Â8:     Â0     Â1     Â0     Â0  IO-APIC-edge   Ârtc0
> > Â9:    1200    Â148    Â395     81  IO-APIC-fasteoi  acpi
> > 12:    1625     Â2    Â348     10  IO-APIC-edge   Âi8042
> > 16:     26     23    Â115     Â0  IO-APIC-fasteoi  ehci_hcd:usb1
> > 17:     16     Â3     12     21  IO-APIC-fasteoi  mmc0
> > 21:     Â2     Â2     33     Â0  IO-APIC-fasteoi  ehci_hcd:usb2
> > 40:     Â0     Â0     14     10  PCI-MSI-edge   Âmei_me
> > 41:   Â15776    Â374    8497    Â687  PCI-MSI-edge   Âahci
> > 42:    1297    Â829    Â115     24  PCI-MSI-edge   Âxhci_hcd
> > 43:    Â103    Â149     Â9    Â212  PCI-MSI-edge   Âi915
> > 44:   Â13151    Â101    Â511     91  PCI-MSI-edge   Âiwlwifi
> > 45:    Â153    Â159     Â0    Â122  PCI-MSI-edge   Âsnd_hda_intel
> > 46:     Â0     Â1     Â1     Â0  PCI-MSI-edge   Âeth0
> > NMI: Â Â Â Â 32 Â Â Â Â 31 Â Â Â Â 31 Â Â Â Â 31 Â Non-maskable interrupts
> > LOC: Â Â Â82504 Â Â Â82732 Â Â Â74172 Â Â Â75985 Â Local timer interrupts
> > SPU: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Spurious interrupts
> > PMI: Â Â Â Â 32 Â Â Â Â 31 Â Â Â Â 31 Â Â Â Â 31 Â Performance monitoring interrupts
> > IWI: Â Â Â17816 Â Â Â16278 Â Â Â13833 Â Â Â13282 Â IRQ work interrupts
> > RTR: Â Â Â Â Â3 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â APIC ICR read retries
> > RES: Â Â Â18784 Â Â Â21084 Â Â Â13313 Â Â Â12946 Â Rescheduling interrupts
> > CAL: Â Â Â Â393 Â Â Â Â422 Â Â Â Â306 Â Â Â Â356 Â Function call interrupts
> > TLB: Â Â Â Â231 Â Â Â Â176 Â Â Â Â235 Â Â Â Â191 Â TLB shootdowns
> > TRM: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Thermal event interrupts
> > THR: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Threshold APIC interrupts
> > MCE: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Machine check exceptions
> > MCP: Â Â Â Â Â3 Â Â Â Â Â3 Â Â Â Â Â3 Â Â Â Â Â3 Â Machine check polls
> > ERR: Â Â Â Â Â0
> > MIS: Â Â Â Â Â0
> >
> > Bad case after:
> > Â Â Â Â Â CPU0 Â Â Â CPU1 Â Â Â CPU2 Â Â Â CPU3 Â Â Â
> > Â0:     17     Â0     Â0     Â0  IO-APIC-edge   Âtimer
> > Â1:    Â415     Â3     85     Â3  IO-APIC-edge   Âi8042
> > Â8:     Â0     Â1     Â0     Â0  IO-APIC-edge   Ârtc0
> > Â9:    1277    Â148    Â428     81  IO-APIC-fasteoi  acpi
> > 12:    1625     Â2    Â348     10  IO-APIC-edge   Âi8042
> > 16:     26     23    Â115     Â0  IO-APIC-fasteoi  ehci_hcd:usb1
> > 17:     16     Â3     12     21  IO-APIC-fasteoi  mmc0
> > 21:     Â2     Â2     33     Â0  IO-APIC-fasteoi  ehci_hcd:usb2
> > 40:     Â0     Â0     14     10  PCI-MSI-edge   Âmei_me
> > 41:   Â17814    Â374    8497    Â687  PCI-MSI-edge   Âahci
> > 42:    1567    Â829    Â115     24  PCI-MSI-edge   Âxhci_hcd
> > 43:    Â103    Â177     Â9    Â242  PCI-MSI-edge   Âi915
> > 44:   Â14956    Â101    Â511     91  PCI-MSI-edge   Âiwlwifi
> > 45:    Â153    Â159     Â0    Â122  PCI-MSI-edge   Âsnd_hda_intel
> > 46:     Â0     Â1     Â1     Â0  PCI-MSI-edge   Âeth0
> > NMI: Â Â Â Â 36 Â Â Â Â 35 Â Â Â Â 34 Â Â Â Â 34 Â Non-maskable interrupts
> > LOC: Â Â Â92429 Â Â Â92708 Â Â Â81714 Â Â Â84071 Â Local timer interrupts
> > SPU: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Spurious interrupts
> > PMI: Â Â Â Â 36 Â Â Â Â 35 Â Â Â Â 34 Â Â Â Â 34 Â Performance monitoring interrupts
> > IWI: Â Â Â22594 Â Â Â19658 Â Â Â17439 Â Â Â14257 Â IRQ work interrupts
> > RTR: Â Â Â Â Â3 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â APIC ICR read retries
> > RES: Â Â Â21491 Â Â Â24670 Â Â Â14618 Â Â Â14569 Â Rescheduling interrupts
> > CAL: Â Â Â Â441 Â Â Â Â439 Â Â Â Â306 Â Â Â Â356 Â Function call interrupts
> > TLB: Â Â Â Â232 Â Â Â Â181 Â Â Â Â274 Â Â Â Â465 Â TLB shootdowns
> > TRM: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Thermal event interrupts
> > THR: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Threshold APIC interrupts
> > MCE: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Machine check exceptions
> > MCP: Â Â Â Â Â3 Â Â Â Â Â3 Â Â Â Â Â3 Â Â Â Â Â3 Â Machine check polls
> > ERR: Â Â Â Â Â0
> > MIS: Â Â Â Â Â0
>
> Lots more local timer interrupts, which is consistent with the higher
> time in interrupt handlers for the bad case.
>
> > > One hypothesis is that your workload and configuration are interacting
> > > with RCU_FAST_NO_HZ to force very large numbers of RCU grace periods.
> > > Could you please check for this by building with CONFIG_RCU_TRACE=y,
> > > mounting debugfs somewhere and dumping rcu/rcu_sched/rcugp before and
> > > after each run?
> >
> > Good case before:
> > completed=8756 Âgpnum=8757 Âage=0 Âmax=21
> > after:
> > completed=14686 Âgpnum=14687 Âage=0 Âmax=21
> >
> > Bad case before:
> > completed=22970 Âgpnum=22971 Âage=0 Âmax=21
> > after:
> > completed=26110 Âgpnum=26111 Âage=0 Âmax=21
>
> In the good case, (14686-8756)/40=148.25 grace periods per second, which
> is a fast but reasonable rate given your HZ=250. Not a large difference
> in the number of grace periods, but extrapolating for the longer runtime,
> maybe ten times as much. But not much change in grace-period rate per
> unit time.
>
> > The test scenario was the following in both cases (mixed english and pseudo-bash):
> > reboot, login, start terminal
> > cd project
> > rm -r build
> > cat /proc/interrupts >> somefile ; cat /sys/kernel/debug/rcu/rcu_sched/rcugp >> somefile
> > scons -j4
> > wait ~40 sec (good case finished, Ctrl-C in bad case)
> > cat /proc/interrupts >> somefile ; cat /sys/kernel/debug/rcu/rcu_sched/rcugp >> somefile
> >
> > I stopped the build in the bad case after about the same time the good
> > case finished, so the extra interrupts and RCU grace periods because of the
> > longer runtime don't fake the results.
>
> That procedure works for me, thank you for laying it out carefully.
>
> I believe I see what is going on and how to fix it, though it may take
> me a bit to work things through and get a good patch.
>
> Thank you very much for your testing efforts!

I'm glad I can help. I've been using Linux for many years, now I have a
chance to help the community, to do something in return. I'm quite
enjoying this :)

Tibor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/