Re: [PATCH] Page writeback broken after resume: wb_timer lost

From: Pavel Machek
Date: Sat May 20 2006 - 18:53:02 EST


Hi!

> > I have noticed for some time that nr_dirty never drops but
> > increases except when VM pressure forces it down. This only
> > occurs after a resume, never on a freshly booted system.
> >
> > It seems the wb_timer is lost when the timer function is
> > trying to start a frozen pdflush thread, and this occurs
> > during suspend or resume.
> >
> > I have included a patch which work for me. Don't know if the
> > test also should include a check for freezing to be safe, ie
> > if ( !frozen(..) && !freezing(..) )

Yep, I have seen this too. Sync took *way* too long and I believe I
lost some data because of this problem.

> Maybe the code over in page-writeback.c should just rearm the timee within
> the timer handler rather than waiting for a pdflush thread to do it. I'll
> think about that.
>
> But the main questions is: what on earth is going on here? We've taken a
> kernel thread and we've done a wake_up_process() on it, but because it was
> in a frozen state it just never gets to run, even after the resume.
> Presumably it goes back into interruptible sleep after the resume. We took
> it off the list (in the expectation that it'd run again) so we've lost
> control of it.

I guess you should not try to wake up process while it is frozen. Such
wakeups are likely to get lost. Should we add some BUG_ON() somewhere?

...we have to eat some wakeups, because we fake some.

Or perhaps we should do WARN_ON(frozen(current)) just after schedule()
below?

> Pavel, Rafael: this amounts to a lost wakeup. What's the story?

Pavel

Refrigerator looks like this:

/* Refrigerator is place where frozen processes are stored :-). */
void refrigerator(void)
{
/* Hmm, should we be allowed to suspend when there are
realtime
processes around? */
long save;
save = current->state;
pr_debug("%s entered refrigerator\n", current->comm);
printk("=");

frozen_process(current);
spin_lock_irq(&current->sighand->siglock);
recalc_sigpending(); /* We sent fake signal, clean it up */
spin_unlock_irq(&current->sighand->siglock);

while (frozen(current)) {
current->state = TASK_UNINTERRUPTIBLE;
schedule();
}
pr_debug("%s left refrigerator\n", current->comm);
current->state = save;
}

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/