Re: Unusually high system CPU usage with recent kernels

From: Tibor Billes
Date: Tue Sep 03 2013 - 17:11:13 EST


> From: Paul E. McKenney Sent: 08/30/13 03:24 AM
> On Tue, Aug 27, 2013 at 10:05:42PM +0200, Tibor Billes wrote:
> > From: Paul E. McKenney Sent: 08/26/13 06:28 AM
> > > Here is a patch that is more likely to help. I am testing it in parallel,
> > > but figured I should send you a sneak preview.
> >
> > I tried it, but I don't see any difference in overall performance. The dstat
> > also shows the same as before.
> >
> > But I did notice something. Occasionally there is an increase in userspace
> > CPU usage, interrupts and context switches are dropping, and it really gets
> > more work done (scons printed commands more frequently). ÂI checked that
> > this behaviour is present without your patch, I just didn't notice this
> > before. Maybe you can make some sense out of it.
> >
> > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
> >   time   |usr sys idl wai hiq siq| read Âwrit| recv Âsend| Âin  out | int  csw | used Âfree| used Âbuff Âcach Âfree|majpf minpf alloc Âfree
> > 27-08 20:51:53| 23 Â62 Â 5 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1274 Â3102k| Â 0 Â7934M| 549M 56.0M Â491M 6698M| Â 0 Â Â28 Â 156 Â 159
> > 27-08 20:51:54| 24 Â64 Â 1 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1317 Â3165k| Â 0 Â7934M| 549M 56.0M Â491M 6698M| Â 0 Â Â53 Â 189 Â 182
> > 27-08 20:51:55| 33 Â50  6  2  9  0| 192k 1832k|  0   0 |  0   0 |1371 Â2442k|  0 Â7934M| 544M 56.0M Â492M 6702M|  0  Â30k  17k  17k
> > 27-08 20:51:56| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1313 Â3220k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â21 Â 272 Â 232
> > 27-08 20:51:57| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1319 Â3226k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â 8 Â Â96 Â 112
> > 27-08 20:51:58| 25 Â63 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1317 Â3224k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â12 Â 145 Â 141
> > 27-08 20:51:59| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1317 Â3223k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â54 Â 193 Â 191
> > 27-08 20:52:00| 25 Â63 Â 0 Â 0 Â12 Â 0| Â 0 Â Â24k| Â 0 Â Â 0 | Â 0 Â Â 0 |1336 Â3216k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â36 Â 161 Â 172
> > 27-08 20:52:01| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1313 Â3225k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â 9 Â 107 Â 107
> > 27-08 20:52:02| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1327 Â3224k| Â 0 Â7934M| 545M 56.0M Â492M 6701M| Â 0 Â Â13 Â 193 Â 200
> > 27-08 20:52:03| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1311 Â3226k| Â 0 Â7934M| 545M 56.0M Â492M 6701M| Â 0 Â Â13 Â 114 Â 114
> > 27-08 20:52:04| 25 Â63 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1331 Â3223k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â41 Â 190 Â 178
> > 27-08 20:52:05| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â8192B| Â 0 Â Â 0 | Â 0 Â Â 0 |1315 Â3222k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â30 Â 123 Â 122
> > 27-08 20:52:06| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1314 Â3223k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â16 Â 187 Â 195
> > 27-08 20:52:07| 25 Â63 Â 1 Â 0 Â12 Â 0|2212k Â192k| Â 0 Â Â 0 | Â 0 Â Â 0 |1637 Â3194k| Â 0 Â7934M| 544M 56.2M Â494M 6699M| Â 0 Â1363 Â2590 Â1947
> > 27-08 20:52:08| 17 Â33 Â18 Â26  6  0|3208k  Â0 |  0   0 |  0   0 |1351 Â1766k|  0 Â7934M| 561M 56.3M Â499M 6678M|  4  Â10k 7620 Â2055
> > 27-08 20:52:09| 47 Â21 Â16 Â13  4  0|4332k Â628k|  0   0 |  0   0 |1400 Â1081k|  0 Â7934M| 647M 56.3M Â504M 6587M| Â10  Â24k  25k 1151
> > 27-08 20:52:10| 36 Â34 Â13 Â11  6  0|2636k 2820k|  0   0 |  0   0 |1451 Â1737k|  0 Â7934M| 598M 56.3M Â507M 6632M|  5  Â19k  16k  28k
> > 27-08 20:52:11| 46 Â17 Â10 Â25 Â 3 Â 0|4288k Â536k| Â 0 Â Â 0 | Â 0 Â Â 0 |1386 Â 868k| Â 0 Â7934M| 613M 56.3M Â513M 6611M| Â24 Â Â13k 8908 Â3616
> > 27-08 20:52:12| 53 Â33  5  4  5  0|4740k 3992k|  0   0 |  0   0 |1773 Â1464k|  0 Â7934M| 562M 56.6M Â521M 6654M|  0  Â36k  29k  40k
> > 27-08 20:52:13| 60 Â34  0  1  6  0|4228k 1416k|  0   0 |  0   0 |1642 Â1670k|  0 Â7934M| 593M 56.6M Â526M 6618M|  0  Â36k  26k  17k
> > 27-08 20:52:14| 53 Â37 Â 1 Â 3 Â 7 Â 0|3008k 1976k| Â 0 Â Â 0 | Â 0 Â Â 0 |1513 Â1972k| Â 0 Â7934M| 541M 56.6M Â529M 6668M| Â 0 Â Â10k 9986 Â Â23k
> > 27-08 20:52:15| 55 Â34  1  4  6  0|3636k 1284k|  0   0 |  0   0 |1645 Â1688k|  0 Â7934M| 581M 56.6M Â535M 6622M|  0  Â43k  30k  18k
> > 27-08 20:52:16| 57 Â30  5  2  5  0|4404k 2320k|  0   0 |  0   0 |1715 Â1489k|  0 Â7934M| 570M 56.6M Â541M 6627M|  0  Â39k  24k  26k
> > 27-08 20:52:17| 50 Â35  3  7  6  0|2520k 1972k|  0   0 |  0   0 |1699 Â1598k|  0 Â7934M| 587M 56.6M Â554M 6596M|  0  Â65k  40k  32k
> > 27-08 20:52:18| 52 Â40  2  1  7  0|1556k 1732k|  0   0 |  0   0 |1582 Â1865k|  0 Â7934M| 551M 56.6M Â567M 6619M|  0  Â35k  26k  33k
> > 27-08 20:52:19| 26 Â62 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1351 Â3240k| Â 0 Â7934M| 551M 56.6M Â568M 6619M| Â 0 Â Â86 Â 440 Â 214
> > 27-08 20:52:20| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â 108k| Â 0 Â Â 0 | Â 0 Â Â 0 |1392 Â3162k| Â 0 Â7934M| 555M 56.6M Â560M 6623M| Â 0 Â1801 Â1490 Â2672
> > 27-08 20:52:21| 25 Â63 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1332 Â3198k| Â 0 Â7934M| 555M 56.6M Â560M 6623M| Â 0 Â Â50 Â 245 Â 255
> > 27-08 20:52:22| 25 Â63 Â 1 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1350 Â3220k| Â 0 Â7934M| 556M 56.6M Â560M 6622M| Â 0 Â 755 Â 544 Â 286
> > 27-08 20:52:23| 27 Â62 Â 1 Â 0 Â11 Â 0| Â 0 Â 272k| Â 0 Â Â 0 | Â 0 Â Â 0 |1323 Â3092k| Â 0 Â7934M| 551M 56.6M Â558M 6628M| Â 0 Â 341 Â1464 Â3085
> > 27-08 20:52:24| 25 Â63 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1334 Â3197k| Â 0 Â7934M| 551M 56.6M Â559M 6627M| Â 0 Â Â63 Â 514 Â 273
> > 27-08 20:52:25| 25 Â63 Â 0 Â 0 Â12 Â 0| Â 0 Â Â40k| Â 0 Â Â 0 | Â 0 Â Â 0 |1329 Â3243k| Â 0 Â7934M| 546M 56.6M Â558M 6633M| Â 0 Â 321 Â 160 Â1679
> > 27-08 20:52:26| 39 Â50  1  1  9  0| Â48k Â644k|  0   0 |  0   0 |1500 Â2556k|  0 Â7934M| 552M 56.6M Â560M 6625M|  0  Â30k  14k  12k
> > 27-08 20:52:27| 26 Â62 Â 1 Â 0 Â11 Â 0| Â 0 Â 192k| Â 0 Â Â 0 | Â 0 Â Â 0 |1380 Â3152k| Â 0 Â7934M| 553M 56.6M Â560M 6624M| Â 0 Â2370 Â 808 Â 718
> > 27-08 20:52:28| 23 Â55 Â12 Â 0 Â10 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1247 Â2993k| Â 0 Â7934M| 553M 56.6M Â561M 6624M| Â 0 Â1060 Â 428 Â 241
> > 27-08 20:52:29| 25 Â63 Â 1 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1318 Â3142k| Â 0 Â7934M| 554M 56.6M Â561M 6623M| Â 0 Â 663 Â 442 Â 198
> > 27-08 20:52:30| 25 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â 100k| Â 0 Â Â 0 | Â 0 Â Â 0 |1316 Â3212k| Â 0 Â7934M| 554M 56.6M Â561M 6623M| Â 0 Â Â42 Â 187 Â 186
> > 27-08 20:52:31| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â4096B| Â 0 Â Â 0 | Â 0 Â Â 0 |1309 Â3222k| Â 0 Â7934M| 554M 56.6M Â561M 6623M| Â 0 Â Â 9 Â Â85 Â Â85
> > 27-08 20:52:32| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1297 Â3219k| Â 0 Â7934M| 554M 56.6M Â561M 6623M| Â 0 Â Â23 Â Â95 Â Â84
> > 27-08 20:52:33| 25 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1339 Â3101k| Â 0 Â7934M| 555M 56.6M Â557M 6625M| Â 0 Â 923 Â1566 Â2176
> > 27-08 20:52:34| 25 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1333 Â3095k| Â 0 Â7934M| 555M 56.6M Â559M 6623M| Â 0 Â 114 Â 701 Â 195
> > 27-08 20:52:35| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1342 Â3036k| Â 0 Â7934M| 557M 56.6M Â568M 6613M| Â 0 Â 957 Â3225 Â 516
> > 27-08 20:52:36| 26 Â60 Â 2 Â 0 Â11 Â 0| Â 0 Â Â28k| Â 0 Â Â 0 | Â 0 Â Â 0 |1340 Â3091k| Â 0 Â7934M| 555M 56.6M Â568M 6614M| Â 0 Â5304 Â5422 Â5609
> > 27-08 20:52:37| 25 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1348 Â3260k| Â 0 Â7934M| 556M 56.6M Â565M 6617M| Â 0 Â Â30 Â 156 Â1073
> > 27-08 20:52:38| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1314 Â3211k| Â 0 Â7934M| 556M 56.6M Â549M 6633M| Â 0 Â Â11 Â 105 Â4285
> > 27-08 20:52:39| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1353 Â3031k| Â 0 Â7934M| 558M 56.6M Â559M 6620M| Â 0 Â 847 Â3866 Â 357
> > 27-08 20:52:40| 26 Â63 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1309 Â3135k| Â 0 Â7934M| 569M 56.6M Â566M 6602M| Â 0 Â3940 Â5727 Â1288
>
> Interesting. The number of context switches drops during the time
> that throughput improves. It would be good to find out what task(s)
> are doing all of the context switches. One way to find the pids of the
> top 20 context-switching tasks should be something like this:
>
> grep ctxt /proc/*/status | sort -k2nr | head -20
>
> You could use any number of methods to map back to the command.
> When generating my last patch, I was assuming that ksoftirqd would be
> the big offender. Of course, if it is something else, I should be
> taking a different approach.

I did some measuring of these context switches. I used the above command
line to get the number of context switches before compiling and about 40
seconds later.

The good case is:
file                    Âbefore Âafter  process name
/proc/10/status:voluntary_ctxt_switches: Â Â38108 Â 54327 Â [rcu_sched]
/proc/19/status:voluntary_ctxt_switches: Â Â31022 Â 38682 Â [ksoftirqd/2]
/proc/14/status:voluntary_ctxt_switches: Â Â12947 Â 16914 Â [ksoftirqd/1]
/proc/3/status:voluntary_ctxt_switches: Â Â 11061 Â 14885 Â [ksoftirqd/0]
/proc/194/status:voluntary_ctxt_switches: Â 10044 Â 11660 Â [kworker/2:1H]
/proc/1603/status:voluntary_ctxt_switches: Â7626 Â Â9593 Â Â/usr/bin/X
/proc/24/status:voluntary_ctxt_switches: Â Â6415 Â Â9571 Â Â[ksoftirqd/3]
/proc/20/status:voluntary_ctxt_switches: Â Â5317 Â Â8879 Â Â[kworker/2:0]

The bad case is:
file                      Âbefore   Âafter    process name
/proc/3/status:voluntary_ctxt_switches: Â Â Â Â 82411972 Â Â98542227 Â Â[ksoftirqd/0]
/proc/23/status:voluntary_ctxt_switches: Â Â Â Â79592040 Â Â94206823 Â Â[ksoftirqd/3]
/proc/13/status:voluntary_ctxt_switches: Â Â Â Â79053870 Â Â93654161 Â Â[ksoftirqd/1]
/proc/18/status:voluntary_ctxt_switches: Â Â Â Â78136449 Â Â92288688 Â Â[ksoftirqd/2]
/proc/2293/status:nonvoluntary_ctxt_switches: Â 29038881 Â Â29038881 Â Âmate-panel
/proc/2308/status:nonvoluntary_ctxt_switches: Â 26607661 Â Â26607661 Â Ânm-applet
/proc/2317/status:nonvoluntary_ctxt_switches: Â 15494474 Â Â15494474 Â Â/usr/lib/polkit-mate/polkit-mate-authentication-agent-1
/proc/2148/status:nonvoluntary_ctxt_switches: Â 13763674 Â Â13763674 Â Âx-session-manager
/proc/2985/status:nonvoluntary_ctxt_switches: Â Â Â Â Â Â Â 12062706 Â Âpython /usr/bin/scons -j4
/proc/2323/status:nonvoluntary_ctxt_switches: Â 11581510 Â Â11581510 Â Âmate-volume-control-applet
/proc/2353/status:nonvoluntary_ctxt_switches: Â 9213436 Â Â 9213436 Â Â mate-power-manager
/proc/2305/status:nonvoluntary_ctxt_switches: Â 8328471 Â Â 8328471 Â Â /usr/lib/matecomponent/matecomponent-activation-server
/proc/3041/status:nonvoluntary_ctxt_switches: Â Â Â Â Â Â Â 7638312 Â Â sh -c "/opt/arm-2011.03/bin/arm-none-eabi-gcc" [...]
/proc/3055/status:nonvoluntary_ctxt_switches: Â Â Â Â Â Â Â 4843977 Â Â sh -c "/opt/arm-2011.03/bin/arm-none-eabi-gcc" [...]
/proc/2253/status:nonvoluntary_ctxt_switches: Â 4449918 Â Â 4449918 Â Â mate-keyring-daemon

The processes for the tables were partly collected by hand so they may
not contain every relevant process. But what I see is that there are a
lot more context switches in the bad case, quite many of these are non
voluntary. The context switch increase is pretty much everywhere, not
just ksoftirqd. I suspect the Mate related processes did their context
switches during bootup/login, which is consistent with the bootup/login
procedure being significantly slower (just like the compilation).

Tibor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/