Re: 2.6.22-rc1: Broken suspend on SMP with tifm

From: Rafael J. Wysocki
Date: Sun May 13 2007 - 16:45:32 EST


On Sunday, 13 May 2007 22:30, Oleg Nesterov wrote:
> On 05/14, Oleg Nesterov wrote:
> >
> > On 05/13, Rafael J. Wysocki wrote:
> > >
> > > The suspend/hibernation is broken on SMP due to:
> > >
> > > commit 3540af8ffddcdbc7573451ac0b5cd57a2eaf8af5
> > > tifm: replace per-adapter kthread with freezeable workqueue
> > >
> > > Well, it looks like freezable worqueues still deadlock with CPU hotplug
> > > when worker threads are frozen.
> >
> > Ugh. I thought we deprecated create_freezeable_workqueue(), exactly
> > because suspend was changed to call _cpu_down() after freeze().
> >
> > It is not that "looks like freezable worqueues still deadlock", it
> > is "of course, freezable worqueues deadlocks" on CPU_DEAD.
> >
> > The ->freezeable is still here just because of incoming "cpu-hotplug
> > using freezer" rework.
> >
> > No?
> >
> > > --- linux-2.6.22-rc1.orig/kernel/workqueue.c
> > > +++ linux-2.6.22-rc1/kernel/workqueue.c
> > > @@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
> > > struct cpu_workqueue_struct *cwq;
> > > struct workqueue_struct *wq;
> > >
> > > - action &= ~CPU_TASKS_FROZEN;
> > > -
> > > - switch (action) {
> > > + switch (action & ~CPU_TASKS_FROZEN) {
> >
> > Confused. How can we see, say CPU_UP_PREPARE_FROZEN, if we cleared
> > CPU_TASKS_FROZEN bit?
>
> So, unless I missed something stupid, this patch is not 100% right.

Well, it isn't, but for a different reason (see [*] below).

> I think the better fix (at least for now) is
>
> - #define create_freezeable_workqueue(name) __create_workqueue((name), 0, 1)
> + #define create_freezeable_workqueue(name) __create_workqueue((name), 1, 1)
>
> Alex, do you really need a multithreaded wq?
>
> Rafael, what do you think?

That would be misleading if the driver needs the threads to be frozen.

I would prefer to revert the commit that caused the problem to appear, but it
doesn't revert cleanly and I hate to invalidate someone else's work becuase of
my own mistakes.

[*] Getting back to the patch, it seems to me that we should do something like
take_over_work() before thawing the frozen thread, because there may be a queue
to process and the device is suspended at that point.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/