Re: [RFC][PATCH 09/10] taskstats: Fix exit CPU time accounting

From: Oleg Nesterov
Date: Mon Sep 27 2010 - 12:56:03 EST


On 09/27, Martin Schwidefsky wrote:
>
> On Sun, 26 Sep 2010 20:11:27 +0200
> Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>
> > No. Well yes, it is not accounted, but this is not because it is
> > kthread.
>
> We noticed that behavior with kernel threads but as you point out
> the problem is bigger than that.
>
> > It is very simple, currently linux accounts the exiting task's
> > utime and adds its to ->cutime _only_ if parent does do_wait().
> > If parent ignores SIGCHLD, the child reaps itself and it is not
> > accounted.
> >
> > I do not know why it was done this way, but I'm afraid we can't
> > change this historical behaviour.
>
> Why?

Please don't ask me ;) I was equally surprised when I studied this
code in the past.

> I would consider it to be a BUG() that the time is not accounted.
> Independent of the fact that a parent wants to see the SIGCHLD and
> the exit status of its child the process time of the child should be
> accounted, no?

I do not know. It doesn't look like a BUG(), I mean it looks as if
the code was intentionally written this way.

> And I'm not a particular fan of the "this has always
> been that way" reasoning.

Me too, but unfortunately it often happens, sometimes we can't improve
things just because we should not break existing programs.

Once again, don't get me wrong. Personally I agree, to me it makes
sense to move the "update parent's cxxx" code from wait_task_zombie()
to __exit_signal(), and account children unconditionally.

But I do not know who can approve this very much user-visible change.
Perhaps Roland.

> Got the part about self-reaping processes. But there is another issue:
> consider an exiting thread where the group leader is still active.
> The time for the thread will be added to the utime/stime fields in
> the signal structure.

(to clarify, s/group leader/last thread/)

> Taskstats will happily ignore that time while
> the group leader is still running.

Sorry. I didn't read the whole series and I forgot everything I knew
about taskstats. But I don't understand this "ignore" above, every
exiting thread calls taskstats_exit()->fill_pid()->bacct_add_tsk()
and reports ->ac_utime?

Never mind. You seem to want to update, say, cutime when a sub-thread
exits, before the whole process exits, right? Again, trivial to
implement but this is another user-visible change.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/