Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659

From: Srivatsa S. Bhat
Date: Thu Jan 26 2012 - 14:23:11 EST


On 01/26/2012 05:21 AM, Rafael J. Wysocki wrote:

> Hi,
>>
>> SNAPSHOT_CREATE_IMAGE has a check for data->ready such as:
>>
>> if (data->mode != O_RDONLY || !data->frozen || data->ready) {
>> error = -EPERM;
>> break;
>> }
>>
>> data->ready would be set to 1 only under SNAPSHOT_CREATE_IMAGE. However,
>> SNAPSHOT_FREE (invoked at the place shown above) will reset the value to 0.
>> This makes it possible for hibernation_snapshot() and hence
>> freeze_workqueues_begin() to be called a second time, which is unfortunate.
>
> Yes, I obviously forgot about that code path when I was working on the commit
> that introduced the problem. :-(
>
> Thanks a lot for the great analysis, it's really helpful!
>


Welcome :-) It was fun!


>> And actually, the patch I posted in my previous mail is not really the right
>> long-term fix, though it might fix the particular issue that Jiri is facing..
>>
>> Because, allowing hibernation_snapshot() to get called a second time while
>> kernel threads are still frozen brings us to the same situation that commit
>> 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating memory)
>> tried to prevent! IOW, a call to hibernate_preallocate_memory() would be
>> done inside hibernation_snapshot(), when kernel threads are frozen.. which
>> is known to break XFS, to give one example as mentioned in the changelog
>> of the above commit.
>
> That's exactly right.
>
>> So, the right way to fix this IMHO, would be to split up thaw_processes()
>> just like freezing phase:
>>
>> /* freezes or thaws user space processes */
>> freeze_processes() - thaw_processes()
>>
>> /* freezes or thaws kernel threads */
>> freeze_kernel_threads() - thaw_kernel_threads()
>>
>> We have to insert this thaw_kernel_threads() at appropriate places in such a
>> way as to not require another ioctl if possible... Then things would be
>> more symmetric (and hence more easy to understand) and we can avoid getting
>> into strange situations as discussed here.
>>
>> But before we venture into that, it would be good to know if the patch posted
>> in the previous mail fixes the particular problem reported in this thread,
>> atleast just to see if there are other problems lurking that we aren't aware
>> of yet..
>
> Jiri has already said that the patch works.
>
> I think we could avoid the issue entirely by introducing thaw_kernel_threads
> and making SNAPSHOT_FREE call it. No other changes should be necessary.
>
> IOW, Jiri, does the patch below help?
>
> [BTW, the freeze_tasks()'s kerneldoc seems to be outdated. Tejun?]
>
> ---


This is exactly the kind of fix I was suggesting.. Thanks Rafael!

I have a small request for a comment. Please see below.
I have a question too, but for that I'll have to reply to my earlier
thread so that I can comment on the userspace code.

> include/linux/freezer.h | 2 ++
> kernel/power/process.c | 19 +++++++++++++++++++
> kernel/power/user.c | 1 +
> 3 files changed, 22 insertions(+)
>
> Index: linux/include/linux/freezer.h
> ===================================================================
> --- linux.orig/include/linux/freezer.h
> +++ linux/include/linux/freezer.h
> @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt
> extern int freeze_processes(void);
> extern int freeze_kernel_threads(void);
> extern void thaw_processes(void);
> +extern void thaw_kernel_threads(void);
>
> static inline bool try_to_freeze(void)
> {
> @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c
> static inline int freeze_processes(void) { return -ENOSYS; }
> static inline int freeze_kernel_threads(void) { return -ENOSYS; }
> static inline void thaw_processes(void) {}
> +static inline void thaw_kernel_threads(void) {}
>
> static inline bool try_to_freeze(void) { return false; }
>
> Index: linux/kernel/power/process.c
> ===================================================================
> --- linux.orig/kernel/power/process.c
> +++ linux/kernel/power/process.c
> @@ -188,3 +188,22 @@ void thaw_processes(void)
> printk("done.\n");
> }
>
> +void thaw_kernel_threads(void)
> +{
> + struct task_struct *g, *p;
> +
> + pm_nosig_freezing = false;
> + printk("Restarting kernel threads ... ");
> +
> + thaw_workqueues();
> +
> + read_lock(&tasklist_lock);
> + do_each_thread(g, p) {
> + if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
> + __thaw_task(p);
> + } while_each_thread(g, p);
> + read_unlock(&tasklist_lock);
> +
> + schedule();
> + printk("done.\n");
> +}
> Index: linux/kernel/power/user.c
> ===================================================================
> --- linux.orig/kernel/power/user.c
> +++ linux/kernel/power/user.c
> @@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file *
> swsusp_free();
> memset(&data->handle, 0, sizeof(struct snapshot_handle));
> data->ready = 0;


It would be nice to have a comment here explaining why we call
thaw_kernel_threads() here. (Such a comment would avoid confusion when people
look at SNAPSHOT_CREATE_IMAGE and SNAPSHOT_FREE and wonder why there is
thawing involved, while the corresponding freezing is nowhere in sight..
Of course the freezing is hidden inside hibernation_snapshot(), but that
might not be immediately apparent to everyone.)

> + thaw_kernel_threads();

> break;

>
> case SNAPSHOT_PREF_IMAGE_SIZE:


Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/