Re: [PATCHv4 17/17] writeback: lessen sync_supers wakeup count

From: Nick Piggin
Date: Mon May 31 2010 - 08:47:58 EST


On Mon, May 31, 2010 at 12:04:04PM +0300, Artem Bityutskiy wrote:
> > On Mon, May 31, 2010 at 11:25:52AM +0300, Artem Bityutskiy wrote:
> > > Hmm, but it looks like we cannot do that either. If we do
> > >
> > > set_current_state(TASK_INTERRUPTIBLE);
> > > if (supers_dirty)
> > > bdi_arm_supers_timer();
> > > schedule();
> > >
> > > and the kernel is preemptive, is it possible that we get preempted
> > > before we run 'bdi_arm_supers_timer()', but after we do
> > > 'set_current_state(TASK_INTERRUPTIBLE)'. And we will never wake up if
> > > the timer armed in mark_sb_dirty() went off.
> > >
> > > So it looks like this is the way to go:
> > >
> > > /*
> > > * Disable preemption for a while to make sure we are not
> > > * preempted before the timer is armed.
> > > */
> > > preempt_disable();
> > > set_current_state(TASK_INTERRUPTIBLE);
> > > if (supers_dirty)
> > > bdi_arm_supers_timer();
> > > preempt_enable();
> > > schedule();
> >
> > This should not be required because preempt is transparent to these
> > task sleep/schedule APIs.
> >
> > The preempt event will not clear TASK_INTERRUPTIBLE, and so the timer
> > wakeup will set it to TASK_RUNNING (whether or not it has called
> > schedule() yet and whether or not it is currently preempted).
>
> Nick, I'm sorry, but could you please elaborate:
>
> set_current_state(TASK_INTERRUPTIBLE);
> /*
> * XXX: what if we are preempted here. No timer is armed. Our state is
> * TASK_INTERRUPTIBLE, supers_dirty is 1, so no one will ever wake us
> * up. Thus, we'll sleep forever.
> */
> if (supers_dirty)
> bdi_arm_supers_timer();
> schedule();
>
> Not sure, but I did quick search and it looks like in preemptive kernel,
> an interrupt may happen in the XXX place above, then it will call
> 'preempt_schedule_irq()', which sill call 'schedule()'.

Yes, preempt does not participate in tsak sleeping exactly for reasons
such as this.

>From kernel/sched.c:schedule()

if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
if (unlikely(signal_pending_state(prev->state, prev)))
prev->state = TASK_RUNNING;
else
deactivate_task(rq, prev, DEQUEUE_SLEEP);
switch_count = &prev->nvcsw;
}

If the task is not running, then is only removed from the runqueue
(or reset to running in case of pending signal) IFF it has not been
scheduled from an involuntary kernel preemption.

So in the XXX region, the task will actually be allowed to run again
until it calls schedule().

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/