Re: linux-next: Tree for April 14 (Call-traces: RCU/ACPI/WQ related?)

From: Sedat Dilek
Date: Thu Apr 21 2011 - 05:07:45 EST


On Thu, Apr 21, 2011 at 7:08 AM, Paul E. McKenney
<paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> On Thu, Apr 14, 2011 at 03:44:11PM -0700, Paul E. McKenney wrote:
>> On Fri, Apr 15, 2011 at 12:19:34AM +0200, Sedat Dilek wrote:
>> > On Thu, Apr 14, 2011 at 12:19 PM, Sedat Dilek
>> > <sedat.dilek@xxxxxxxxxxxxxx> wrote:
>> > > On Thu, Apr 14, 2011 at 11:16 AM, Sedat Dilek
>> > > <sedat.dilek@xxxxxxxxxxxxxx> wrote:
>> > >> [ Adding CC to RCU maintainer (Hi Paul :-)) ]
>> > >>
>> > >> Helping me for now with (see also Documentation/RCU/stallwarn.txt):
>> > >>
>> > >> # cat /sys/module/rcutree/parameters/rcu_cpu_stall_suppress
>> > >> 0
>> > >>
>> > >> # echo "1" > /sys/module/rcutree/parameters/rcu_cpu_stall_suppress
>> > >>
>> > >> # cat /sys/module/rcutree/parameters/rcu_cpu_stall_suppress
>> > >> 1
>> > >>
>> > >> - Sedat -
>> > >>
>> > >
>> > > That workaround helped till a system-freeze when generating a tarball
>> > > from my current kernel-tree.
>> > > I switched back to my yesterday's linux-next kernel.
>> > >
>> > > - Sedat -
>> > >
>> >
>> > I isolated the culprit so far:
>> >
>> > commit 900507fc62d5ba0164c07878dbc36ac97866a858
>> > "rcu: move TREE_RCU from softirq to kthread"
>> >
>> > With this revert my system does not show the symptoms I have reported.
>>
>> Hmmm... ÂI never was able to reproduce this, but did find a workload
>> that slowed up the grace periods. ÂI fixed that (which turned out to
>> be a wakeup problem), but my hopes that it would also fix your problem
>> were clearly unfounded. ÂI have once again stopped exporting this commit
>> to -next.
>
> I have added some debug tracing, which are available at branch
> "sedat.2011.04.19a" in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
>
> Alternatively, if it is easier, the shown below can be used. ÂFWIW,
> this patch is against 2.6.39-rc3.
>
> Either way, if you get a chance to run your tests on this, could you
> please run the attached script (collectdebugfs.sh) and capture its output?
> Sample output is attached as well (collectdebugfs.sh.out): Âthe script
> should output something vaguely like the sample output every 15 seconds
> or so.
>
> The script assumes that debugfs is enabled (along with CONFIG_RCU_TRACE=y)
> and mounted as follows:
>
> Â Â Â Âmount -t debugfs none /sys/kernel/debug/
>
> Or if you mount debugfs somewhere else, please set the script's DEBUGFS_MP
> variable accordingly.
>
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂThanx, Paul
>
> ------------------------------------------------------------------------
>

Welcome to operation "Kill that RCU brainbug" (Starship troopers part X)!

Of course I can help with testing.

Paul, did you see recent RCU-related fixes to fs between rc3 and rc4?

commit c1530019e311c91d14b24d8e74d233152d806e45
vfs: Fix absolute RCU path walk failures due to uninitialized seq number

fff3e5ade4455a4b42a19c95dd7a167a3cb7956a
fs: synchronize_rcu when unregister_filesystem success not failure

IIRC, Jens has pending block/plugging patches in his for-linus tree.
Especially this one (CONFIG_PREEMPT):

5f45c69589b7d2953584e6cd0b31e35dbe960ad0
cfq-iosched: read_lock() does not always imply rcu_read_lock()

Some questions to test-scenario:

Shall I test from linux-2.6-rcu.git#sedat.2011.04.19a GIT tree?
I think that's the ideal solution.
Or shall I pull sedat.2011.04.19a GIT branch into "BROKEN" linux-next
(next-20110414)?

Again, with which RCU/HZ/PREEMPT kernel-config options shall I test?
This is from my yesterday's linux-next:

# egrep 'RCU|_HZ |PREEMPT' /boot/config-2.6.39-rc4-next20110420.4-686-small
# RCU Subsystem
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
CONFIG_RCU_TRACE=y
CONFIG_RCU_FANOUT=32
# CONFIG_RCU_FANOUT_EXACT is not set
CONFIG_RCU_FAST_NO_HZ=y
CONFIG_TREE_RCU_TRACE=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_CPU_STALL_TIMEOUT=60

Regards,
- Sedat -

P.S.: Is that intended you have no master GIT defined?

$ git clone git://git.us.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
Cloning into linux-2.6-rcu...
remote: Counting objects: 2012268, done.
remote: Compressing objects: 100% (323153/323153), done.
Receiving objects: 100% (2012268/2012268), 418.89 MiB | 341 KiB/s, done.
remote: Total 2012268 (delta 1675063), reused 2007602 (delta 1670549)
Resolving deltas: 100% (1675063/1675063), done.
warning: remote HEAD refers to nonexistent ref, unable to checkout.

$ ls -l linux-2.6-rcu/
total 32
drwxr-xr-x 3 sd sd 4096 Apr 21 10:26 .
drwxr-xr-x 39 sd sd 20480 Apr 21 10:26 ..
drwxr-xr-x 7 sd sd 4096 Apr 21 10:49 .git

$ du -s -h linux-2.6-rcu/
473M linux-2.6-rcu/

$ cd linux-2.6-rcu/

$ git pull
You asked me to pull without telling me which branch you
want to merge with, and 'branch.master.merge' in
your configuration file does not tell me, either. Please
specify which branch you want to use on the command line and
try again (e.g. 'git pull <repository> <refspec>').
See git-pull(1) for details.

If you often merge with the same branch, you may want to
use something like the following in your configuration file:

[branch "master"]
remote = <nickname>
merge = <remote-ref>

[remote "<nickname>"]
url = <url>
fetch = <refspec>

See git-config(1) for details.

$ git pull master
fatal: 'master' does not appear to be a git repository
fatal: The remote end hung up unexpectedly

$ git branch -r | grep sedat
origin/sedat.2011.04.19a

$ git checkout -b sedat.2011.04.19a origin/sedat.2011.04.19a
Checking out files: 100% (36702/36702), done.
Branch sedat.2011.04.19a set up to track remote branch
sedat.2011.04.19a from origin.
Switched to a new branch 'sedat.2011.04.19a'

$ ls -l
total 480
-rw-r--r-- 1 sd sd 18693 Apr 21 10:54 COPYING
-rw-r--r-- 1 sd sd 93908 Apr 21 10:54 CREDITS
drwxr-xr-x 91 sd sd 12288 Apr 21 10:54 Documentation
-rw-r--r-- 1 sd sd 2464 Apr 21 10:54 Kbuild
-rw-r--r-- 1 sd sd 252 Apr 21 10:54 Kconfig
-rw-r--r-- 1 sd sd 192586 Apr 21 10:54 MAINTAINERS
-rw-r--r-- 1 sd sd 52374 Apr 21 10:54 Makefile
-rw-r--r-- 1 sd sd 17525 Apr 21 10:54 README
-rw-r--r-- 1 sd sd 3371 Apr 21 10:54 REPORTING-BUGS
drwxr-xr-x 26 sd sd 4096 Apr 21 10:55 arch
drwxr-xr-x 2 sd sd 4096 Apr 21 10:55 block
drwxr-xr-x 3 sd sd 4096 Apr 21 10:55 crypto
drwxr-xr-x 92 sd sd 4096 Apr 21 10:55 drivers
drwxr-xr-x 37 sd sd 4096 Apr 21 10:55 firmware
drwxr-xr-x 71 sd sd 4096 Apr 21 10:55 fs
drwxr-xr-x 22 sd sd 4096 Apr 21 10:55 include
drwxr-xr-x 2 sd sd 4096 Apr 21 10:55 init
drwxr-xr-x 2 sd sd 4096 Apr 21 10:55 ipc
drwxr-xr-x 8 sd sd 4096 Apr 21 10:55 kernel
drwxr-xr-x 8 sd sd 4096 Apr 21 10:55 lib
drwxr-xr-x 2 sd sd 4096 Apr 21 10:55 mm
drwxr-xr-x 53 sd sd 4096 Apr 21 10:55 net
drwxr-xr-x 9 sd sd 4096 Apr 21 10:55 samples
drwxr-xr-x 13 sd sd 4096 Apr 21 10:55 scripts
drwxr-xr-x 8 sd sd 4096 Apr 21 10:55 security
drwxr-xr-x 22 sd sd 4096 Apr 21 10:55 sound
drwxr-xr-x 9 sd sd 4096 Apr 21 10:55 tools
drwxr-xr-x 2 sd sd 4096 Apr 21 10:55 usr
drwxr-xr-x 3 sd sd 4096 Apr 21 10:55 virt

- EOT -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/