Re: [RFC] Fix SMP brokenness for PF_FREEZE and make freezing usablefor other purposes

From: Kirill Korotaev
Date: Tue Jun 28 2005 - 02:37:20 EST


-static inline int freezing(struct task_struct *p)
-{
- return p->flags & PF_FREEZE;
-}
+#if defined(CONFIG_PM) || defined(CONFIG_MIGRATE)

<<<< why not to make a single option CONFIG_REFRIGERATOR? It looks to be a
more robust way, since there are multiple users of it.


Yes. That may be better. We can do that once the migration code is finished and when we know what kind of CONFIG_XXX the migration code really needs.


+#ifdef CONFIG_PM

<<<< is it intentionaly? or you just lost CONFIG_MIGRATE?
It is intentional. freeze_processes and thaw_processes are only needed for suspend. One only needs to freeze a couple of processes for process migration.
But PM and your migrate code can be not the only users of it.

<<<< I still think this refrigerator is racy with freeze_processes():
<<<< scenarios:
<<<< scenario 1
<<<<
task1 -> freeze_processes(): taskXXX ->refrigerator()
checked (task->flags & PF_FROZEN) == 0 cur->flags |= PF_FROZEN
clear TIF_FREEZE
<sleep on thaw>
set TIF_FREEZING
clear PF_FROZEN

<<<< so the task awakes with TIF_FREEZE flag set!!!


Hmm... If we wait to clear both flags until after the completion notification then we do not have the race right? But then we need to move the signal recalc since it tests for TIF_FREEZE too.
It is almost ok, but it is still not fine :)

look what happens if you call freeze/unfreeze in a loop:

refrigerator:
awakes

freezer:
check PF_FROZEN, it is still set, skips task and thinks it is finished freezing.

refrigerator:
clears PF_FROZEN and TIF_FREEZE and returns.

I think you can fix this by moving PF_FROZEN check and set in both places under siglock.

Kirill


<<<< scenario 2
<<<< look at error path in freeze_processes (on timeout), it is broken as
well. You need to wakeup tasks there...


Ok. How about this additional patch? This still requires that process freezing does not immediately occurr again after the completion handler. All of this is iffy due to not having a real lock protecting all these values and we may still need to add some barriers.

Index: linux-2.6.12/kernel/power/process.c
===================================================================
--- linux-2.6.12.orig/kernel/power/process.c 2005-06-28 06:34:52.000000000 +0000
+++ linux-2.6.12/kernel/power/process.c 2005-06-28 06:40:28.000000000 +0000
@@ -47,12 +47,13 @@ int freeze_processes(void)
unsigned long flags;
if (!freezeable(p))
continue;
- if ((p->flags & PF_FROZEN) ||
- (p->state == TASK_TRACED) ||
+ if ((p->state == TASK_TRACED) ||
(p->state == TASK_STOPPED))
continue;
set_thread_flag(TIF_FREEZE);
+ if (p->flags & PF_FROZEN)
+ continue;
spin_lock_irqsave(&p->sighand->siglock, flags);
signal_wake_up(p, 0);
spin_unlock_irqrestore(&p->sighand->siglock, flags);
@@ -63,6 +64,8 @@ int freeze_processes(void)
if (time_after(jiffies, start_time + TIMEOUT)) {
printk( "\n" );
printk(KERN_ERR " stopping tasks failed (%d tasks remaining)\n", todo );
+ complete_all(&thaw);
+ up(&freezer_sem);
return todo;
}
} while(todo);
Index: linux-2.6.12/kernel/sched.c
===================================================================
--- linux-2.6.12.orig/kernel/sched.c 2005-06-28 06:34:52.000000000 +0000
+++ linux-2.6.12/kernel/sched.c 2005-06-28 06:37:36.000000000 +0000
@@ -5210,13 +5210,13 @@ DECLARE_COMPLETION(thaw);
void refrigerator(void)
{
current->flags |= PF_FROZEN;
+ wait_for_completion(&thaw);
clear_thread_flag(TIF_FREEZE);
+ current->flags &= ~PF_FROZEN;
/* A fake signal 0 may have been sent. Recalculate sigpending */
spin_lock_irq(&current->sighand->siglock);
recalc_sigpending();
spin_unlock_irq(&current->sighand->siglock);
- wait_for_completion(&thaw);
- current->flags &= ~PF_FROZEN;
}
EXPORT_SYMBOL(refrigerator);
#endif



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/