Re: [PATCH] trace: reset sleep/block start time on task switch

From: Peter Zijlstra
Date: Tue Jan 24 2012 - 09:27:49 EST


On Mon, 2012-01-23 at 15:02 -0800, Arun Sharma wrote:
> > +++ b/kernel/sched/fair.c
> > @@ -1191,6 +1191,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> > if (entity_is_task(se)) {
> > struct task_struct *tsk = task_of(se);
> >
> > + se->statistics.sleep_start = 0;
> > + se->statistics.block_start = 0;
> > +
>
> We might still need some additional logic to ignore sleep_start if the
> last context switch was a preemption. Test case Andrew Vagin posted on
> 12/21:
>
> nanosleep();
> s = time(NULL);
> while (time(NULL) - s < 4);
>
> During the busy wait while loop, sleep_start is non-zero and the first
> sample from sched_stat_sleeptime() and anyone else doing the (now -
> sleep_start) computation would get a bogus value until the next dequeue.

Bah, you're right. Also yes your proposal is too intrusive, but that can
be fixed, I actually did, but then I noticed its broken too, it doesn't
matter if the schedule that schedules a task back in preempted another
task or not, what matters is if the task we're scheduling back in was
itself preempted or recently woken. And we simply don't know.

I'm tempted to revert 1ac9bc69 for now, userspace will simply have to
correlate trace_sched_switch() and trace_sched_stat_{sleep,blocked}(),
which shouldn't be too hard.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/