Re: 2.6.30-rc8 Oops whilst booting

From: Chris Clayton
Date: Mon Jun 08 2009 - 06:58:25 EST

2009/6/8 Chris Clayton <chris2553@xxxxxxxxxxxxxx>:
> Hi Neil,
> Thanks for the reply.
> 2009/6/7 NeilBrown <neilb@xxxxxxx>:
>> On Mon, June 8, 2009 8:31 am, Jaswinder Singh Rajput wrote:
>>> On Sun, 2009-06-07 at 19:38 +0100, Chris Clayton wrote:
>>>> 2009/6/7 Jaswinder Singh Ra
>>>> >> >
>> This message says that it found a vfat filesystem on 8:3x (I cannot see
>> what digit should be 'x').  That is probably sdc1 or sdc2. Maybe even
>> sdc6 or sdc7.
>> However the vfat filesystem didn't have /sbin/init.
>> This one says it couldn't find anything at 8,22, which I think
>> should be sdb6.
>> It also shows that you have and sdc6, but sdb only goes up to sdb3.
>> So it seems that your disk drives have changed name - not a wholely
>> unexpected event these days.
>> We now need answers to questions like:
>>  - what device do you expect the root filesystem to be on
>>  - how is the kernel being told this?  Maybe it is hard coded
>>    into your initrd.  Knowing which distro and what /etc/fstab
>>    says might help (though it wouldn't help me, I'm just about out
>>    of my depth at this point)
>> Maybe if you changed /etc/fstab to mount by uuid instead of hardcoding
>> e.g. /etc/sdb3, and then run "mkinitramfs" or whatever, it might work.
> Yes, I've just been looking at the photographs of the panics again and
> I've noticed that two of my discs are being detected in the "wrong
> order". There are three HDDS. The first, /dev/sda, is the master on
> the first IDE port and contains sda1..sda7. The second, normally
> /dev/sdb, is the slave on that port and contains sdb1..sdb6. The
> third, normally /dev/sdc, is attached to the first SATA port and
> contains sdc1..sdc3. The second photograph I posted shows that sdb and
> sdc have been reversed. The first partition on the disc that is
> normally /dev/sdb does indeed have a FAT32 filesystem in the first
> partition.
> By the way, I should have said that in between the panics that the two
> photographs show, I copied contents of /dev/sdc1, which I normally
> boot from, to /dev/sdb6, so that I minimised the risk to sdc1 in the
> reboot festival that bisecting would involve. I also, of course,
> changed the name of the root partition that is passed to the kernel by
> GRUB and amended /etc/fstab on /dev/sdb6. That's why the partitions
> shown in the photographs seem inconsistent. Sorry I forgot to mention
> that - I really shouldn't do these things late at night :-).
> As I indicate above, when booting the partition I have set up to do
> this bisecting,  I expect the root filesystem to be on /dev/hdb6. As I
> also indicate, this information is passed to the kernel through GRUB's
> /boot/grub/menu.lst. The kernel is configured specifically for my
> system and the drivers needed to boot the system are built in to the
> kernel, so I don't use an initrd. IIRC, that's the way Slackware is
> installed today, except, of course, it's a big fat kernel with all
> drivers needed to boot any system built in. I could be wrong on that
> though, it's a while since I installed
> As to the distro, it used to be (the now defunct) Peanut Linux, which
> was derived from Slackware. However, it's years since I installed it
> and I have upgraded just about everything in user space and added many
> other things (udev, dbus...). I don't think that makes any difference
> here, though, because we don't get as far as user space. On a
> successful boot, the system is stable and runs trouble-free for
> several hours a day, every day.
> Hope this helps.
> I'm a good way through bisecting again and this time the system has to
> boot without a panic 100 times before I mark a kernel as good. I'll
> post the result later.

Finally got to the end of the bisection/reboot festival. I ended up here:

[chris:~/kernel/linux-2.6]$ git bisect good
d5a877e8dd409d8c702986d06485c374b705d340 is first bad commit
commit d5a877e8dd409d8c702986d06485c374b705d340
Author: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
Date: Sun May 24 13:03:43 2009 -0700

async: make sure independent async domains can't accidentally entangle

The problem occurs when async_synchronize_full_domain() is called when
the async_pending list is not empty. This will cause lowest_running()
to return the cookie of the first entry on the async_pending list, which
might be nothing at all to do with the domain being asked for and thus
cause the domain synchronization to wait for an unrelated domain. This
can cause a deadlock if domain synchronization is used from one domain
to wait for another.

Fix by running over the async_pending list to see if any pending items
actually belong to our domain (and return their cookies if they do).

Signed-off-by: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>

:040000 040000 fab1e0c06572605a7015061db4a7e0a77c04fa91
34252dbb7fed3942f5952c25639564bbd77357da M kernel

I can't claim to know what the change actually means, but the change
seems to be a much better candidate than my previous bisection outcome
where I required only 20 "panicless" boots to regard the kernel as
good. As I said earlier today, this time I required 100 such boots.

I'll revert that change, give the new kernel the reboot treatment :-)
and report back later.


> Thanks
>> Good luck,
>> NeilBrown
> --
> No, Sir; there is nothing which has yet been contrived by man, by which
> so much happiness is produced as by a good tavern or inn - Doctor Samuel
> Johnson

No, Sir; there is nothing which has yet been contrived by man, by which
so much happiness is produced as by a good tavern or inn - Doctor Samuel
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at