Re: linux-next: Tree for Jun 28 [ BISECTED: rsyslog/imklog: High CPUusage ]
From: Sedat Dilek
Date: Sat Jun 29 2013 - 17:44:45 EST
On Fri, Jun 28, 2013 at 9:33 AM, Stephen Rothwell <sfr@xxxxxxxxxxxxxxxx> wrote:
> Hi all,
>
> Changes since 20130627:
>
> This tree produces the following warning when built for many (all?)
> configs (it has been fixed in the drm tree):
>
> drivers/video/Kconfig:42:error: recursive dependency detected!
> drivers/video/Kconfig:42: symbol FB is selected by DRM_KMS_HELPER
> drivers/gpu/drm/Kconfig:29: symbol DRM_KMS_HELPER is selected by DRM_OMAP
> drivers/gpu/drm/omapdrm/Kconfig:2: symbol DRM_OMAP depends on FB_OMAP2
> drivers/video/omap2/omapfb/Kconfig:1: symbol FB_OMAP2 depends on FB
>
> The arm-soc tree gained a conflict against the mfd tree.
>
> I applied a patch to the akpm tree to remove a lot of warnings.
>
[ I kept the CC list of the culprit commit (sorry if I forgot anyone) ]
Hi,
[ SYMPTOM ]
Last Friday I wanted to listen to some music-videos on my
next-20130628 kernel and noticed that my battery got sucked empty very
fast.
I could see with htop that 2 (of 4) rsyslogd processes ran on 100%.
NOTE: next-20130627 (Thursday's Linux-Next) did NOT show these symptoms.
[ "CSI dileks" CHECKS THE LOGS ]
While investigating the logs I saw this:
[ /var/log/syslog ]
Jun 29 15:25:34 fambox rsyslogd: [origin software="rsyslogd"
swVersion="5.8.6" x-pid="900" x-info="http://www.rsyslog.com"] exiting
on signal 15.
Jun 29 15:25:39 fambox kernel: imklog 5.8.6, log source = /proc/kmsg started.
Jun 29 15:25:39 fambox rsyslogd: [origin software="rsyslogd"
swVersion="5.8.6" x-pid="4689" x-info="http://www.rsyslog.com"] start
Jun 29 15:25:39 fambox rsyslogd: rsyslogd's groupid changed to 103
Jun 29 15:25:39 fambox rsyslogd: rsyslogd's userid changed to 101
Jun 29 15:25:39 fambox rsyslogd-2039: Could not open output pipe
'/dev/xconsole' [try http://www.rsyslog.com/e/2039 ]
Jun 29 15:25:39 fambox kernel:
Jun 29 15:26:09 fambox kernel: last message repeated 30176 times
Jun 29 15:27:09 fambox kernel: last message repeated 59603 times
Jun 29 15:28:09 fambox kernel: last message repeated 60081 times
Jun 29 15:29:09 fambox kernel: last message repeated 60148 times
[ 1ST TRYOUT ]
[1] recommends to use a more modern rsyslog-version, so I upgraded
from (5.8.6-1ubuntu8.4) to (7.4.1-0ubuntu1ppa1) from an unofficial
3rd-party software-archive [2] for Ubuntu/precise.
This did not help - still 100% CPU usage observed.
[ 2ND TRYOUT: WORKAROUND ]
Following [1], I temporarily disabled imklog rsyslog-module:
[ /etc/rsyslog.conf ]
...
-$ModLoad imklog # provides kernel logging support (previously done by rklogd)
+# XXX: Workaround: Do NOT load imklog as it causes 100% CPU workload
by flooding logs. -dileks
+#$ModLoad imklog # provides kernel logging support (previously done
by rklogd)
Yeah, this resulted in a "normal" system - no high CPU loads anymore
(in idle-mode).
[ CURIOSITY KILLED THE CAT ]
I cannot say if this here is related...
[ /var/log/syslog ]
fambox rsyslogd-2039: Could not open output pipe '/dev/xconsole' [try
http://www.rsyslog.com/e/2039 ]
...it looks like I have to create a '/dev/xconsole' manually.
root# mknod -m 640 /dev/xconsole p
With the "BROKEN" kernel this IIRC did not help, the good one did not
show xconsole-related stuff in the logs.
But I might remember wrong.
See also:
[ /etc/rsyslog.d/50-default.conf ]
...
# The named pipe /dev/xconsole is for the `xconsole' utility. To use it,
# you must invoke `xconsole' with the `-file' option:
#
# $ xconsole -file /dev/xconsole [...]
#
# NOTE: adjust the list below, or you'll go crazy if you have a reasonably
# busy site..
#
daemon.*;mail.*;\
news.err;\
*.=debug;*.=info;\
*.=notice;*.=warn |/dev/xconsole
- EOF -
Running manually 'xconsole -file /dev/xconsole' was not successful
(could not open... device) with the bad kernel.
Might be good to comment that xconsole block?
[ GIT-BISECT SESSION ]
As this all did not show me what caused the problem I started a
git-bisect session.
This revealed the following culprit commit:
commit bb1f30cb7d3ba21098f0ee7e0382160ba2599a43
"wait: introduce wait_event_common(wq, condition, state, timeout)"
This commit was 1st introduced with Friday's next-20130628!
NOTE: See also git-bisect-log.txt and git-bisect-visualize.txt for the results.
Reverting it successfully with two other related patches results in a
"normal" system again.
See attached patch lists:
Revert "fix warnings from ?: operator in wait.h"
Revert "wait: introduce prepare_to_wait_event()"
Revert "wait: introduce wait_event_common(wq, condition, state, timeout)"
[ GIVEAWAYS ]
For the sake of completeness I have attached my kernel-config.
Oleg, can you look at that?
If you need more inputs lemme know.
Thanks in advance.
Regards,
- Sedat -
[1] http://www.gossamer-threads.com/lists/rsyslog/users/8087
[2] https://launchpad.net/~tmortensen/+archive/rsyslogv7
git bisect start
# good: [fb1532869a3921bc217edb2f06964e9efd75a568] Merge branch 'akpm-current/current'
git bisect good fb1532869a3921bc217edb2f06964e9efd75a568
# bad: [d8fe403d09764591d25ffc728949b2f3a0020773] Merge branch 'akpm/master'
git bisect bad d8fe403d09764591d25ffc728949b2f3a0020773
# good: [be851d19f3a162b68836da8ab8772ae7f8eba79a] drivers/usb/gadget/amd5536udc.c: avoid calling dma_pool_create() with NULL dev
git bisect good be851d19f3a162b68836da8ab8772ae7f8eba79a
# good: [0d46acb92aaa6077b3dd859d1636d72615e2a636] drivers/rtc/rtc-generic.c: remove empty function
git bisect good 0d46acb92aaa6077b3dd859d1636d72615e2a636
# bad: [8b346fdf43f14501db193e9088b2e44d20430a52] ia64: remove setting for saved_max_pfn
git bisect bad 8b346fdf43f14501db193e9088b2e44d20430a52
# good: [f802d8cca28b3497dd74557eed476c531efb93f5] ptrace/x86: revert "hw_breakpoints: Fix racy access to ptrace breakpoints"
git bisect good f802d8cca28b3497dd74557eed476c531efb93f5
# good: [92f7d08c6972cd5f8d6fa8a1b2e0fb59af9675fe] coredump: kill cn_escape(), introduce cn_esc_printf()
git bisect good 92f7d08c6972cd5f8d6fa8a1b2e0fb59af9675fe
# good: [fa920e518e9ac4bf819d47e0c34460c1c5b8dc4c] kernel/fork.c:copy_process(): don't add the uninitialized child to thread/task/pid lists
git bisect good fa920e518e9ac4bf819d47e0c34460c1c5b8dc4c
# bad: [43de5170e6435da91a6dc9f0255fedb90f29ec8b] wait: introduce prepare_to_wait_event()
git bisect bad 43de5170e6435da91a6dc9f0255fedb90f29ec8b
# good: [930854cd19c6840167e8c993cbeab34c56de63d1] fs/exec.c: do_execve_common(): use current_user()
git bisect good 930854cd19c6840167e8c993cbeab34c56de63d1
# bad: [bb1f30cb7d3ba21098f0ee7e0382160ba2599a43] wait: introduce wait_event_common(wq, condition, state, timeout)
git bisect bad bb1f30cb7d3ba21098f0ee7e0382160ba2599a43
# good: [94dca9a282022d9e5c667f96142c6175cf2e598f] fs/exec.c:de_thread: mt-exec should update ->real_start_time
git bisect good 94dca9a282022d9e5c667f96142c6175cf2e598f
# first bad commit: [bb1f30cb7d3ba21098f0ee7e0382160ba2599a43] wait: introduce wait_event_common(wq, condition, state, timeout)
commit bb1f30cb7d3ba21098f0ee7e0382160ba2599a43
Author: Oleg Nesterov <oleg@xxxxxxxxxx>
Date: Fri Jun 28 09:53:57 2013 +1000
wait: introduce wait_event_common(wq, condition, state, timeout)
1. 4c663cfc "fix false timeouts when using wait_event_timeout()"
is not enough, wait(wq, true, 0) still returns zero.
__wait_event_timeout() was already fixed but we need the same
logic in wait_event_timeout() if the fast-path check succeeds.
2. wait_event_timeout/__wait_event_timeout interface do not match
wait_event(), you can't use __wait_event_timeout() instead of
wait_event_timeout() if you do not need the fast-path check.
Same for wait_event_interruptible/__wait_event_interruptible,
so this patch cleanups rtlx.c, ip_vs_sync.c, and af_irda.c:
- __wait_event_interruptible(wq, cond, ret);
+ ret = __wait_event_interruptible(wq, cond);
3. wait_event_* macros duplicate the same code.
This patch adds a single helper wait_event_common() which hopefully
does everything right. Compiler optimizes out the "dead" code when
we do not need signal_pending/schedule_timeout.
"size vmlinux" reports:
text data bss dec hex filename
- 4978601 2935080 10104832 18018513 112f0d1 vmlinux
+ 4977769 2930984 10104832 18013585 112dd91 vmlinux
but I think this depends on gcc/config.
In particular, wait_even_timeout(true, non_const_timeout) should
generate more code in the non-void context because the patch adds
the additional code to fix the 1st problem.
Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx>
Reviewed-by: Tejun Heo <tj@xxxxxxxxxx>
Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
Cc: Imre Deak <imre.deak@xxxxxxxxx>
Cc: Lukas Czerner <lczerner@xxxxxxxxxx>
Cc: Samuel Ortiz <samuel@xxxxxxxxxx>
Cc: Wensong Zhang <wensong@xxxxxxxxxxxx>
Cc: Simon Horman <horms@xxxxxxxxxxxx>
Cc: Julian Anastasov <ja@xxxxxx>
Cc: Ralf Baechle <ralf@xxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Attachment:
3.10.0-rc7-next20130628-7-iniza-small.patch
Description: Binary data
Attachment:
config-3.10.0-rc7-next20130628-7-iniza-small
Description: Binary data