Re: [PATCH] taskstats: retain dead thread stats in TGID queries
From: Yiyang Chen
Date: Mon Mar 30 2026 - 13:57:42 EST
Hi Dr. Thomas
> I can discern that this was a structurally simple (MPI) program that
> spawned one process per CPU core and probably had two extra threads per
> core for communication. It allocated 34 % more memory than it actually
> needed. This one program took so much of the job's resources that other
> processes don't really count. A bad HPC job has a long table of
> commands each contributing a little, down towards individual calls to
> 'cat' and the like. I want to see and present those cases.
>
> In another application, I collect statistics using accumulated CPU time
> and coremem per program binary to be able to tell which programs and
> (older) versions use how much of our cluster over the years.
>
> With a counter for total tasks over the group lifetime added to struct
> taskstats and the missing fields filled following your patch, I could
> get all this information with a lot less overhead via datasets only on
> tgid exit and would not have to count each task as it finishes. I
> always like less overhead for monitoring/accounting!
Thanks a lot for the detailed feedback and for sharing your use case!
> > Factor the per-task TGID accumulation into a helper and use it in both
> > fill_stats_for_tgid() and fill_tgid_exit(). This keeps the fields
> > retained for dead threads aligned with the fields already accounted for
> > live threads, and follows the existing taskstats TGID aggregation model,
> > which already accumulates delay accounting in fill_tgid_exit() and
> > combines it with a live-thread scan in fill_stats_for_tgid().
>
> Pardon my ignorance, as I do not have the time right now to dive back
> into kernel code: Should other fields of interest also be filled? Do we
> have all of them covered? Memory highwater marks are not per-task,
> right? But coremem, virtmem? I/O stats?
You're right that my current patch only covers
ac_etime/ac_utime/ac_stime/nvcsw/nivcsw and delay accounting.
I focused on these fields that were already accumulated in
fill_stats_for_tgid() for live threads, to fix the inconsistency
where dead threads lost accumulation in TGID queries.
Also unify the fields for TGID queries and exit notifications,
and ensure that dead threads are correctly counted.
But adding the other fields makes sense as a follow-up patch.
This may require a minor refactoring to reuse some of the code
for PID taskstats accounting.
> Also, in the end, I'd strongly prefer this patch to include a
> user-visible change in the API, like an increased TASKSTATS_VERSION.
> There are no new fields added, but the interpretation of the data is
> different now for tgid.
My current thinking is not to bump TASKSTATS_VERSION,
since the struct layout and fields are unchanged.
But if maintainers think the semantic change should be versioned,
I’m happy to do that.
Thanks,
Yiyang Chen