Re: [PATCH 1/2] taskstats: set version in TGID exit notifications

From: Yiyang Chen

Date: Tue Mar 31 2026 - 12:35:50 EST


On Tue, Mar 31, 2026 at 5:29 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, 30 Mar 2026 03:00:40 +0800 Yiyang Chen <cyyzero16@xxxxxxxxx> wrote:
>
> > delay accounting started populating taskstats records with a valid
> > version field via fill_pid() and fill_tgid().
> >
> > Later, commit ad4ecbcba728 ("[PATCH] delay accounting taskstats
> > interface send tgid once") changed the TGID exit path to send the
> > cached signal->stats aggregate directly instead of building the outgoing
> > record through fill_tgid(). Unlike fill_tgid(), fill_tgid_exit() only
> > accumulates accounting data and never initializes stats->version.
> >
> > As a result, TGID exit notifications can reach userspace with
> > version == 0 even though PID exit notifications and
> > TASKSTATS_CMD_GET replies carry a valid taskstats version.
> >
> > Set stats->version = TASKSTATS_VERSION after copying the cached TGID
> > aggregate into the outgoing netlink payload so all taskstats records are
> > self-describing again.
> >
> > Fixes: ad4ecbcba728 ("[PATCH] delay accounting taskstats interface send tgid once")
>
> Thanks, lol, 20 years ago.
>
> Can you explain how others can trigger this?  Some combination of
> steps which results in the bad output?

Yes. This is easy to reproduce with `tools/accounting/getdelays.c`.

I have a small follow-up patch for that tool which:
1. increases the receive buffer/message size so the pid+tgid combined exit
notification is not dropped/truncated
2. prints `stats->version`.

With that patch, the reproducer is:

Terminal 1:
./getdelays -d -v -l -m 0

Terminal 2:
taskset -c 0 python3 -c 'import threading,time; t=threading.Thread(target=time.sleep,args=(0.1,)); t.start(); t.join()'

That produces both PID and TGID exit notifications for the same process. The PID
exit record reports a valid taskstats version, while the TGID exit record reports
`version 0`.

>
> > Cc: stable@xxxxxxxxxxxxxxx
>
> Is there a chance of breaking existing userspace here?  Some existing
> userspace code which is expecting 0 here and will get surprised by this
> change?

In practice, userspace uses `taskstats.version` to decide which fields are
present in `struct taskstats`, i.e. as a schema/version discriminator. A zero
version does not describe a valid taskstats layout, so it is hard to see how
userspace could use `0` as a meaningful or useful distinction here.

So I do not think fixing this in mainline should break sensible userspace. It
just restores consistency of the taskstats version semantics across
`TASKSTATS_CMD_GET`, PID exit notifications, and TGID exit notifications.

To be honest, I'm also not sure if this should backport to stable. But I think
mainline should still fix it.

Thanks,
Yiyang