Re: 2.6.21-rc3-mm2

From: Con Kolivas
Date: Thu Mar 08 2007 - 09:53:24 EST


On Thursday 08 March 2007 15:19, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc3/2.
>6.21-rc3-mm2/
>
> - This is the same as 2.6.21-rc3-mm1, except Con's CPU scheduler changes
> were dropped.

As for benchmarking between them I haven't seen anyone post anything yet.
Test.kernel.org is having a field day with both of them ABORTing or WARNing
without one GOOD so far. Urgh what was the description you gave of -mm to
Gene?...I don't envy you :(

Apart from your big fat oops on that massive config, I've had people boot it
also on alpha and IA64. The bitmap error also manifested on alpha but not on
IA64 (neither oopsed).


So on qemu I can reproduce the oops you're getting with your config (make
oldconfig all default on top of your config), but I'm getting other wonderful
related problems too on rc3-mm2. On qemu -mm1 boots mostly without error and
then crashes nicely when I type 'ps' with a long pause for about twenty
seconds and then a combination of soft lockups, bitmap errors, and eventually
hits the BUG_ON I put in bitmap_error(). However, -mm2 also vomits on
typing 'ps'.

It pauses and then spits out (fun lines selected from ps output):

7 ? serial8250: too much work for irq4
00:00:00 watchdog/1
88 ? 00:00:0serial8250: too much work for irq4
0 cqueue/1
137 ? 00:00serial8250: too much work for irq4
:00 aio/0

Checking a few /proc files I see that "serial83250" info littered
throughout /proc/stat as well. -mm2 does not oops but the proc output is
variously corrupted.

Interestingly if I don't type 'ps' in the -mm1 qemu it runs fine with no sign
of a bug... In summary, here I can only reproduce your big fat oops by it
being triggered by some corruption elsewhere on this config related to /proc
breakage that I haven't managed to track down. I checked the broken-out
patches to see which touched /proc and it was oh, most of them. I tried on rc3
and had the same thing happen. I haven't tried rc3 without rsdl (your config
takes too darn long to build!).


The bitmap error that you're hitting on ppc I believe is a different issue. I
don't have hardware that does it at my end but one user has hit it also on
alpha and is trying to help me find the root cause. If we change it from
being a once only printk to every time, it seems to go away after 10 mins of
runtime never to return again. What could possibly change over that time for
the error never to return again is a mystery. This part definitely looks to
be my fault but a bug "getting better and going away" is a new one for me. I
have to think about some timer interaction although I don't use jiffies or
timers directly at all in my code. It also happens on 2.6.20 with RSDL.


Summary from what I've been able to find:
x86_32: ok
x86_64: ok
x86_64 fat config: scheduler code oops brought on by accessing /proc
IA64 ok: ok
Alpha: bitmap error, runs ok


I'm off to pass out for now so that I'm of some use to the rest of my
existence tomorrow.

--
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/