On Thu, Mar 01, 2007 at 09:30:56AM +0900, Michael Ellerman wrote:On Wed, 2007-02-28 at 10:13 +0000, David Woodhouse wrote:On Wed, 2007-02-28 at 07:43 +0100, Benjamin Herrenschmidt wrote:I wouldn't be that sure ... I've had problems in the past with PMU based
cpufreq... looks like flushing all caches and hard-resetting the
processor on the fly when there can be pending DMAs might be a source of
trouble... especially on CPUs that don't have working cache flush HW
assist.
I've seen it on a PowerMac3,1 (400MHz G4) where we don't have cpufreq.
I've also seen it on the latest 1.5GHz Mac Mini, and on my shinybook.
They all fall over with the latest kernel, although the shinybook only
does so immediately when booted with mem=512M. The shinybook does crash
later with new kernels though; I don't yet know why. It could be the
same thing, or it could be something different. That one seemed to
appear between Fedora's 2.6.19-1.2913 and 2.6.19-1.2914 kernels, where
we did nothing but turned CONFIG_SYSFS_DEPRECATED on.
I don't blame cpufreq. At various times I've been equally convinced that
it was due to CONFIG_KPROBES, and Linus' initrd-moving patch.
Is there any pattern to the way it dies? Or is it just randomly dieing
somewhere depending on which config options you have enabled?
This is starting to sound reminiscent of a bug I chased for a while last
year on Power5, but didn't find. It was "fixed" on some machines by
disabling CONFIG_KEXEC, and/or other random unrelated CONFIG options.
Unfortunately it magically stopped reproducing so I never caught it :/
Hmm. The crash came back after I booted into Mac OS X and back. It was however
a different crash, I believe it was coming from the USB modules (as it would
keep going when it happened, and get another crash, which tended to scroll away
too fast for me to capture) but I believe it was still getting down into the
slab code and actually dying there.
However, reverting the reversion of
8d610dd52dd1da696e199e4b4545f33a2a5de5c6 and instead applying
the following patch:
diff -ru linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c linux- source-2.6.20/arch/powerpc/mm/init_32.c
--- linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c 2007-02-05 05:44:54.000000000 +1100
+++ linux-source-2.6.20/arch/powerpc/mm/init_32.c 2007-03-10 11:03:56.000000000 +1100
@@ -244,7 +244,8 @@
void free_initrd_mem(unsigned long start, unsigned long end)
{
if (start < end)
- printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
+ printk ("NOT Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
+ return;
for (; start < end; start += PAGE_SIZE) {
ClearPageReserved(virt_to_page(start));
init_page_count(virt_to_page(start));
which if I recall correctly David Woodhouse posted to this thread,
seems to have fixed it.
I dunno if it's relevant, but my initrd.img is 13193315 bytes long,
(ie 99 bytes over 12884k) and the above logs:
"NOT Freeing initrd memory: 12888k freed"
which makes sense...
I of course completely failed to think to check this with the crashing
kernel, if it seems relevant I can roll back to it and get the numbers.