Re: 2.6.39-rc5-git2 boot crashs

From: Linus Torvalds
Date: Tue May 03 2011 - 12:29:15 EST


2011/5/3 werner <w.landgraf@xxxxx>:
> Pls watch the config enclosed.
>
> IDE on , X86_EXTENDED_PLATFORM off (also X_86 elan)
>
> From the previous two suggestions, MTD on (appearently don't makes
> problems), but of MISC-FILESYSTEMS what appearently causes the error message
> during boot and perhaps also that sync don't work, I switched on the half
> and off the other half, to circle the problem.

Ok, can you try the attached patch, to see if the logfs oops goes
away. Perhaps more importantly, does the sync problem also go away?

> No problem with unzip / zip / moving big files etc, , so that this problem
> cames from X86_EXTENDED_PLATTFORM.

Ok, that is very interesting.

> Tell me what to try out now

So at this point you have two problems, and I really would like to
just doubly verify both of them. First off, the attached patch for the
logfs oops and (hopefully) the sync hanging issue.

But secondly, I want you to double--check that whole CONFIG_X86_ELAN
thing - I'd like you to test two kernels that are otherwise totally
identical in their configurations, except one has
CONFIG_X86_EXTENDED_PLATTFORM on and CONFIG_X86_ELAN, and the other
does not. Just to make sure that with all the changes to the config
file, that is really the _only_ difference, and that yes, that's the
one that brings up the "crash at unzip" problem.

I'm adding Ingo Molnar, Thomas Gleixner Peter Anvin to the cc, because
if this whole problem really is because of the x86 CPU configuration,
they may have better ideas than I do.

Ingo/Thomas/Peter: see the whole long and confused thread on lkml. But
it all boils down to Werner using a very full kernel config where not
only is almost everything compiled in (which showed the logfs problem
even though Werner didn't even have a logfs filesystem), but he also
had a very generic x86 kernel. Too generic.

He had CONFIG_X86_EXTENDED_PLATTFORM and CONFIG_X86_ELAN on, and that
has apparently worked for him (and a lot of other people - he does a
distribution) up until 2.6.38. But as of 2.6.39-rc1 it causes some
really odd problems under IO (his test-case is "unzip", but that's
probably fairly random). The problem seems to show up as a bogus IO
list for SATA, causing a big WARN_ON() or oops and then a dead machine
due to IO problems.

I wonder what CONFIG_X86_ELAN has to do with anything, but from all
the config testing werner has done, it really looks like that's the
smoking gun here.

Why does M686 work, but X86_ELAN causes odd problems in 2.6.39-rc?
Allocator issues? Maybe related to the lockless slub paths?

So I obviously agree that X86_ELAN is a crazy choice for a generic
kernel, but it _used_ to work, and this is a regression.

Linus
fs/logfs/super.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/logfs/super.c b/fs/logfs/super.c
index 33435e4b14d2..ce03a182c771 100644
--- a/fs/logfs/super.c
+++ b/fs/logfs/super.c
@@ -480,10 +480,6 @@ static int logfs_read_sb(struct super_block *sb, int read_only)
!read_only)
return -EIO;

- mutex_init(&super->s_dirop_mutex);
- mutex_init(&super->s_object_alias_mutex);
- INIT_LIST_HEAD(&super->s_freeing_list);
-
ret = logfs_init_rw(sb);
if (ret)
return ret;
@@ -601,6 +597,10 @@ static struct dentry *logfs_mount(struct file_system_type *type, int flags,
if (!super)
return ERR_PTR(-ENOMEM);

+ mutex_init(&super->s_dirop_mutex);
+ mutex_init(&super->s_object_alias_mutex);
+ INIT_LIST_HEAD(&super->s_freeing_list);
+
if (!devname)
err = logfs_get_sb_bdev(super, type, devname);
else if (strncmp(devname, "mtd", 3))