Re: [patch 3/21] x86, bts: wait until traced task has beenscheduled out
From: Ingo Molnar
Date:  Wed Apr 01 2009 - 07:42:20 EST
* Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> On 03/31, Markus Metzger wrote:
> >
> > +static void wait_to_unschedule(struct task_struct *task)
> > +{
> > +	unsigned long nvcsw;
> > +	unsigned long nivcsw;
> > +
> > +	if (!task)
> > +		return;
> > +
> > +	if (task == current)
> > +		return;
> > +
> > +	nvcsw  = task->nvcsw;
> > +	nivcsw = task->nivcsw;
> > +	for (;;) {
> > +		if (!task_is_running(task))
> > +			break;
> > +		/*
> > +		 * The switch count is incremented before the actual
> > +		 * context switch. We thus wait for two switches to be
> > +		 * sure at least one completed.
> > +		 */
> > +		if ((task->nvcsw - nvcsw) > 1)
> > +			break;
> > +		if ((task->nivcsw - nivcsw) > 1)
> > +			break;
> > +
> > +		schedule();
> 
> schedule() is a nop here. We can wait unpredictably long...
> 
> Ingo, do have have any ideas to improve this helper?
hm, there's a similar looking existing facility: 
wait_task_inactive(). Have i missed some subtle detail that makes it 
inappropriate for use here?
> Not that I really like it, but how about
> 
> 	int force_unschedule(struct task_struct *p)
> 	{
> 		struct rq *rq;
> 		unsigned long flags;
> 		int running;
> 
> 		rq = task_rq_lock(p, &flags);
> 		running = task_running(rq, p);
> 		task_rq_unlock(rq, &flags);
> 
> 		if (running)
> 			wake_up_process(rq->migration_thread);
> 
> 		return running;
> 	}
> 
> which should be used instead of task_is_running() ?
Yes - wait_task_inactive() should be switched to a scheme like that 
- it would fix bugs like:
  53da1d9: fix ptrace slowness
in a cleaner way.
> We can even do something like
> 
> 	void wait_to_unschedule(struct task_struct *task)
> 	{
> 		struct migration_req req;
> 
> 		rq = task_rq_lock(p, &task);
> 		running = task_running(rq, p);
> 		if (running) {
> 			// make sure __migrate_task() will do nothing
> 			req->dest_cpu = NR_CPUS + 1;
> 			init_completion(&req->done);
> 			list_add(&req->list, &rq->migration_queue);
> 		}
> 		task_rq_unlock(rq, &flags);
> 
> 		if (running) {
> 			wake_up_process(rq->migration_thread);
> 			wait_for_completion(&req.done);
> 		}
> 	}
> 
> This way we don't poll, and we need only one helper.
Looks even better. The migration thread would run complete(), right?
A detail: i suspect this needs to be in a while() loop, for the case 
that the victim task raced with us and went to another CPU before we 
kicked it off via the migration thread.
This looks very useful to me. It could also be tested easily: revert 
53da1d9 and you should see:
   time strace dd if=/dev/zero of=/dev/null bs=1024 count=1000000
performance plummet on an SMP box. The with your fix it should go up 
to near full speed again.
	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/