Re: [PATCH] taskstats: retain dead thread stats in TGID queries

From: Dr. Thomas Orgis

Date: Sun Mar 29 2026 - 11:04:32 EST


Am Fri, 27 Mar 2026 03:12:07 +0800
schrieb Yiyang Chen <cyyzero16@xxxxxxxxx>:

> However, fill_tgid_exit() only accumulates delay accounting into
> signal->stats. This means TGID queries lose the fields that
> fill_stats_for_tgid() adds for live threads once a thread exits,
> including ac_etime, ac_utime, ac_stime, nvcsw and nivcsw.

Seems like you are properly tackling the problem I as an outsider had
when I started out using the taskstats interface, quoting my
task/process accounting tool sources (from around 2018):

* I intended to only count processes (tgid stats), but that
* gives empty values for the ones I am interested in. There was
* a patch posted ages ago that would have added the accounting
* fields in the aggregation ... but did not make it, apparently.
* Linux kernel folks are interested in stuff like that delay
* accounting (not sure I know what this is about), while I want
* a reliable way to add up the compute/memory resources used
* by certain processes.

Thanks!

I was missing the fields interesting to me in tgid stats: (CPU) times,
I/O, memory. I am not sure if I get around testing your patch qickly
due to personal time and brain-time constraints, but I want to express
interest in it.

I ended up adding the AGROUP flag in taskstats version 12 to solve my
issue, allowing my tool to tell exit of an individual thread from exit
of a group. Though this means I have to get all individual thread stats
from the kernel and later sort them into aggregates per process.

In the end I want to present such to users (percentages on sums for the
whole HPC batch job):

-------------------8<---------------
cpu mem io │ maxrss maxvm │ tasks procs │ command
% % % │ GiB GiB │ │
═══════════════════╪═══════════════╪══════════════╪════════
100.0 99.9 100.0 │ 2.6 3.5 │ 576 192 │ some_program

Summary:

Elapsed time: 38% (9.1 out of 24.0 h timelimit)
CPU: 100% (191.7 out of 192 physical CPU cores)
Max. main memory: 37% (273.9 out of 750.0 GiB min. available per node)
----------------->8-----------------

I can discern that this was a structurally simple (MPI) program that
spawned one process per CPU core and probably had two extra threads per
core for communication. It allocated 34 % more memory than it actually
needed. This one program took so much of the job's resources that other
processes don't really count. A bad HPC job has a long table of
commands each contributing a little, down towards individual calls to
'cat' and the like. I want to see and present those cases.

In another application, I collect statistics using accumulated CPU time
and coremem per program binary to be able to tell which programs and
(older) versions use how much of our cluster over the years.

With a counter for total tasks over the group lifetime added to struct
taskstats and the missing fields filled following your patch, I could
get all this information with a lot less overhead via datasets only on
tgid exit and would not have to count each task as it finishes. I
always like less overhead for monitoring/accounting!

> Factor the per-task TGID accumulation into a helper and use it in both
> fill_stats_for_tgid() and fill_tgid_exit(). This keeps the fields
> retained for dead threads aligned with the fields already accounted for
> live threads, and follows the existing taskstats TGID aggregation model,
> which already accumulates delay accounting in fill_tgid_exit() and
> combines it with a live-thread scan in fill_stats_for_tgid().

Pardon my ignorance, as I do not have the time right now to dive back
into kernel code: Should other fields of interest also be filled? Do we
have all of them covered? Memory highwater marks are not per-task,
right? But coremem, virtmem? I/O stats?

Also, in the end, I'd strongly prefer this patch to include a
user-visible change in the API, like an increased TASKSTATS_VERSION.
There are no new fields added, but the interpretation of the data is
different now for tgid.


Alrighty then,

Thomas

--
Dr. Thomas Orgis
HPC @ Universität Hamburg