Re: 2.6.22-rc3 hibernate(?) fails totally - regression (xfs on raid6)

From: David Greaves
Date: Thu Jun 07 2007 - 11:20:24 EST


Mark Lord wrote:
Tejun Heo wrote:

Can you setup serial console and/or netconsole (not sure whether this
would work tho)?

Since he has good console output already, capturable by digicam,
I think a better approach might be to provide a patch with extra instrumentation..
You know.. progress messages and the like, so we can see at what step
things stop working. Or would that not help ?

David, does scrollback work on your dead console?

hmmmm, scrollback doesn't currently _do_ anything.

But the messages didn't scroll there, they just appear (as the memory is restored I assume). The same messages appear during the fail-to-suspend case too.

Linus said at one point:
> Ok, it wasn't a hidden oops. The DISABLE_CONSOLE_SUSPEND=y thing sometimes
> shows oopses that are otherwise hidden, but at other times it just causes
> more problems (hard hangs when trying to display something on a device
> that is suspended, or behind a bridge that got suspended).

> In your case, the screen output just shows normal resume output, and it
> apparently just hung for some unknown reason. It *may* be worth trying to
> do a SysRQ + 't' thing to see what tasks are running (or rather, not
> running), but since you won't be able to capture it, it's probably not
> going to be useful.

So I've since removed DISABLE_CONSOLE_SUSPEND=y
Should I put it back?

I was actually doing the netconsole anyway - but skge is currently a module - I've avoided making any changes to the config during all these tests but what the heck...

And wouldn't you know it.
Get netconsole working (ie new kernel with skge builtin) and I get the hang on suspend. Here's the netconsole output...

swsusp: Basic memory bitmaps created
Stopping tasks ... done.
Shrinking memory... done (0 pages freed)
Freed 0 kbytes in 0.03 seconds (0.00 MB/s)
Suspending console(s)


Given that moving something from module to builtin changes the behaviour I thought I'd bring these warnings up again (Andrew or Alan mentioned similar warnings being problems in another thread...)
Now, I have mentioned these before but there's been a lot going on so here you go:

MODPOST vmlinux
WARNING: arch/i386/kernel/built-in.o(.text+0x968f): Section mismatch: reference to .init.text: (between 'mtrr_bp_init' and 'mtrr_ap_init')
WARNING: arch/i386/kernel/built-in.o(.text+0x9781): Section mismatch: reference to .init.text: (between 'mtrr_bp_init' and 'mtrr_ap_init')
WARNING: arch/i386/kernel/built-in.o(.text+0x9786): Section mismatch: reference to .init.text: (between 'mtrr_bp_init' and 'mtrr_ap_init')
WARNING: arch/i386/kernel/built-in.o(.text+0xa25c): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/i386/kernel/built-in.o(.text+0xa303): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/i386/kernel/built-in.o(.text+0xa31b): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/i386/kernel/built-in.o(.text+0xa344): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/i386/kernel/built-in.o(.exit.text+0x19): Section mismatch: reference to .init.text: (between 'cache_remove_dev' and 'powernow_k6_exit')
WARNING: arch/i386/kernel/built-in.o(.data+0x2160): Section mismatch: reference to .init.text: (between 'thermal_throttle_cpu_notifier' and 'mce_work')
WARNING: kernel/built-in.o(.text+0x14502): Section mismatch: reference to .init.text: (between 'kthreadd' and 'init_waitqueue_head')


David
PS Gotta go - back in a couple of hours - let me know if there are any more tests to try.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/