On 2/2/10, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
On Tuesday 02 February 2010, Alan Jenkins wrote:
On 1/2/10, Rafael J. Wysocki <rjw@xxxxxxx> wrote:I think the hang may be avoided by using this patch
On Saturday 02 January 2010, Alan Jenkins wrote:For context, the kernel thread being created here is the stop_machine
Hi,
I've been suffering from s2disk hangs again. This time, the hangsThis looks like we have run out of memory while creating a new kernel
were always before the hibernation image was written out.
They're still frustratingly random. I just started trying to work out
whether doubling PAGES_FOR_IO makes them go away, but they went away
on their own again.
I did manage to capture a backtrace with debug info though. Here it
is for 2.6.33-rc2. (It has also happened on rc1). I was able to get
the line numbers (using gdb, e.g. "info line
*stop_machine_create+0x27"), having built the kernel with debug info.
[top of trace lost due to screen height]
? sync_page (filemap.c:183)
? wait_on_page_bit (filemap.c:506)
? wake_bit_function (wait.c:174)
? shrink_page_list (vmscan.c:696)
? __delayacct_blkio_end (delayacct.c:94)
? finish_wait (list.h:142)
? congestion_wait (backing-dev.c:761)
? shrink_inactive_list (vmscan.c:1193)
? scsi_request_fn (spinlock.h:306)
? blk_run_queue (blk-core.c:434)
? shrink_zone (vmscan.c:1484)
? do_try_to_free_pages (vmscan.c:1684)
? try_to_free_pages (vmscan.c:1848)
? isolate_pages_global (vmscan.c:980)
? __alloc_pages_nodemask (page_alloc.c:1702)
? __get_free_pages (page_alloc.c:1990)
? copy_process (fork.c:237)
? do_fork (fork.c:1443)
? rb_erase
? __switch_to
? kthread
? kernel_thread
? kthread
? kernel_thread_helper
? kthreadd
? kthreadd
? kernel_thread_helper
INFO: task s2disk:2174 blocked for more than 120 seconds
thread
and we have blocked on I/O while trying to free some space (quite
obviously,
because the I/O doesn't work at this point).
thread. It is created by disable_nonboot_cpus(), called from
hibernation_snapshot(). See e.g. this hung task backtrace -
http://picasaweb.google.com/lh/photo/BkKUwZCrQ2ceBIM9ZOh7Ow?feat=directlink
I think it should help if you increase PAGES_FOR_IO, then.Ok, it's been happening again on 2.6.33-rc6. Unfortunately increasing
PAGES_FOR_IO doesn't help.
I've been using a test patch to make PAGES_FOR_IO tunable at run time.
I get the same hang if I increase it by a factor of 10, to 10240:
# cd /sys/module/kernel/parameters/
# ls
consoleblank initcall_debug PAGES_FOR_IO panic pause_on_oops
SPARE_PAGES
# echo 10240 > PAGES_FOR_IO
# echo 2560 > SPARE_PAGES
# cat SPARE_PAGES
2560
# cat PAGES_FOR_IO
10240
I also added a debug patch to try and understand the calculations with
PAGES_FOR_IO in hibernate_preallocate_memory(). I still don't really
understand them and there could easily be errors in my debug patch,
but the output is interesting.
Increasing PAGES_FOR_IO by almost 10000 has the expected effect of
decreasing "max_size" by the same amount. However it doesn't appear
to increase the number of free pages at the critical moment.
PAGES_FOR_IO = 1024:
http://picasaweb.google.com/lh/photo/DYQGvB_4hvCvVuxZf2ibxg?feat=directlink
PAGES_FOR_IO = 10240:
http://picasaweb.google.com/lh/photo/AIkV_ZBwt22nzN-JdOJCWA?feat=directlink
You may remember that I was originally able to avoid the hang by
reverting commit 5f8dcc2. It doesn't revert cleanly any more.
However, I tried applying my test&debug patches on top of 5f8dcc2~1
(just before the commit that triggered the hang). That kernel
apparently left ~5000 pages free at hibernation time, v.s. ~1200 when
testing the same scenario on 2.6.33-rc6. (As before, the number of
free pages remained the same if I increased PAGES_FOR_IO to 10240).
http://patchwork.kernel.org/patch/74740/
but the hibernation will fail instead.
Can you please repeat your experiments with the patch below applied and
report back?
Rafael
It causes hibernation to succeed <grin>.