Re: posix-cpu-timers revamp

From: Frank Mayhar
Date: Wed Apr 09 2008 - 12:30:39 EST


On Tue, 2008-04-08 at 15:49 -0700, Roland McGrath wrote:
> My explanation about the constraints on exit_state was specifically about
> the context of update_process_times(), which is part of the path for a
> clock tick interrupting current.

Understood.

> > And, you guessed it, the invariant gets violated. Apparently the passed
> > task_struct isn't the same as "current" at this point.
>
> The scheduler code has gotten a lot more complex since I first implemented
> posix-cpu-timers, and I've never been any expert on the scheduler at all.
> But I'm moderately sure all those things are all involved in context
> switch where the task of interest is about to be on a CPU or just was on a
> CPU. I doubt those are places where the task in question could be
> simultaneously executing in exit_notify() on another CPU. But we'd need
> to ask the scheduler experts to be sure we know what we're talking about
> there.

This was my conclusion as well. Certainly the path through do_fork()
(elucidated below) doesn't even allow the task in question to even be
executing, much less on a different CPU, but all these routines with
"struct task_struct *" parameters make me nervous. Which is why I
inserted the invariant check in the first place.

> > Found the exception. do_fork() violates the invariant when it's
> > cranking up a new process. Hmmm.
>
> I haven't figured out what actual code path this refers to.

do_fork=>wake_up_new_task=>task_new_fair=>
enqueue_task_fair=>enqueue_entity=>update_curr

> This sort of concern is among the reasons that checking ->signal was the
> course I found wise to suggest to begin with. We can figure out what the
> constraints on ->exit_state are in all the places by understanding every
> corner of the scheduler.

Well, as much as I would like to take the time to do that, I do have a
_real_ job, here. :-)

> We can measure whether it winds up being in a
> cooler cache line than ->signal and a net loss to add the load, or has
> superior performance as you seem to think. Or we can just test the
> constraint that matters, whether the pointer we loaded was in fact null,
> and rely on RCU to make it not matter if there is a race after that load.
> It doesn't matter whether tsk is current or not, it only matters that we
> have the pointer and that we're using some CPU array slot or other that
> noone else is using simultaneously.
>
> static inline void account_group_exec_runtime(struct task_struct *tsk,
> unsigned long long runtime)
> {
> struct signal_struct *sig;
> struct task_cputime *times;
>
> rcu_read_lock();
> sig = rcu_dereference(tsk->signal);
> if (likely(sig) && sig->cputime.totals) {
> times = per_cpu_ptr(sig->cputime.totals, get_cpu());
> times->sum_exec_runtime += runtime;
> put_cpu_no_resched();
> }
> rcu_read_unlock();
> }

Yeah, agreed. Of course, I was hoping (in vain, apparently) to avoid
this level of overhead here. And I suspect I'll really have to do it in
each of these routines. But I suppose it can't be helped.

Even with a thorough understanding of the scheduler(s) and code based on
that understanding, we would still not (necessarily) be protected from
future changes that violate the assumptions we make on that basis.
--
Frank Mayhar <fmayhar@xxxxxxxxxx>
Google, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/