Re: [RFC] process wide itimer cruft

From: Oleg Nesterov
Date: Tue Feb 03 2009 - 12:26:18 EST

Next message: Ingo Molnar: "[crash] af9005_usb_module_init(): BUG: unable to handle kernelpaging request at ff100000"
Previous message: K.Prasad: "Re: [RFC Patch 1/10] Introducing generic hardware breakpointhandler interfaces"
In reply to: Peter Zijlstra: "[RFC] process wide itimer cruft"
Next in thread: Peter Zijlstra: "Re: [RFC] process wide itimer cruft"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 02/03, Peter Zijlstra wrote:
>
> On Mon, 2009-02-02 at 09:53 +0100, Peter Zijlstra wrote:
>
> I'm punting the sum-all-threads work off to a workqueue,

I don't really understand how this works, but I didn't try to read
this part carefully. For example, when we call thread_group_cputime()
we don't really get the "group" statistics immediately? But this looks
very interesting anyway.

Unfortunately, I think we need some changes with ->signal first.

> The remaining option is to make signal struct itself rcu freed, but
> before I do that, I thought I'd run this code by some folks.

I think we should follow the Ingo's suggestion: we should make ->signal
refcountable, we should never clear task->signal, it should be freed
by __put_task_struct()'s path.

In fact I was going to make this patches the previous week, will try
to do this week. But we need another counter for that, we can't use
signal->count. And we should fix some users which check tsk->signal != NULL
to ensure the task was not released, this is easy.

This blows signal_struct a bit, but otoh with this change we can
move some fields (for example, ->group_leader) to signal_struct.
And we can do many simplifications. Just for example, __sched_setscheduler()
takes ->siglock just to read signal->rlim[].

> @@ -96,14 +105,16 @@ static void __exit_signal(struct task_struct *tsk)
> spin_lock(&sighand->siglock);
>
> posix_cpu_timers_exit(tsk);
> - if (atomic_dec_and_test(&sig->count))
> + if (!atomic_read(&sig->live)) {
> posix_cpu_timers_exit_group(tsk);

This doesn't look exactly right, but I don't see the "real" problems
with this change.

We can have a lot of threads which didn't even pass exit_notify(),
another process can attach the cpu timer to us once we drop the
locks. OK, no real problems afaics, because each sub-thread will
in turn do posix_cpu_timers_exit_group() later.

But this looks a bit too early. It is better to continue to account
these threads, they can consume a lot of cpu. Anyway, this very
minor issue.

> - else {
> + sig->curr_target = NULL;

complete_signal() can crash if it hits ->curr_target = NULL, and
we are still "visible" to signals even if sig->live == 0.

> + } else {
> /*
> * If there is any task waiting for the group exit
> * then notify it:
> */
> - if (sig->group_exit_task && atomic_read(&sig->count) == sig->notify_count)
> + if (sig->group_exit_task &&
> + atomic_read(&sig->live) == sig->notify_count)

This looks wrong. de_thread() can hang forever, put_signal() doesn't
wake up ->group_exit_task.

I think we really need another counter, at least for now.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Ingo Molnar: "[crash] af9005_usb_module_init(): BUG: unable to handle kernelpaging request at ff100000"
Previous message: K.Prasad: "Re: [RFC Patch 1/10] Introducing generic hardware breakpointhandler interfaces"
In reply to: Peter Zijlstra: "[RFC] process wide itimer cruft"
Next in thread: Peter Zijlstra: "Re: [RFC] process wide itimer cruft"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]