Re: [PATCH workqueue/for-6.10-fixes] workqueue: Refactor worker ID formatting and make wq_worker_comm() use full ID string

From: Linus Torvalds
Date: Thu May 23 2024 - 00:32:32 EST


On Wed, 22 May 2024 at 19:38, Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
>
> Indeed, the 16-byte limit is hard-coded in certain BPF code:

It's worse than that.

We have code like this:

memcpy(__entry->comm, t->comm, TASK_COMM_LEN);

and it looks like this code not only has a fixed-size target buffer of
TASK_COMM_LEN, it also just uses "memcpy()" instead of "strscpy()",
knowing that the source has the NUL byte in it.

If it wasn't for that memcpy() pattern, I think this trivial patch
would "JustWork(tm)"

diff --git a/fs/exec.c b/fs/exec.c
index 2d7dd0e39034..5829912a2fa0 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1239,7 +1239,7 @@ char *__get_task_comm(char *buf, size_t
buf_size, struct task_struct *tsk)
{
task_lock(tsk);
/* Always NUL terminated and zero-padded */
- strscpy_pad(buf, tsk->comm, buf_size);
+ strscpy_pad(buf, tsk->real_comm, buf_size);
task_unlock(tsk);
return buf;
}
@@ -1254,7 +1254,7 @@ void __set_task_comm(struct task_struct *tsk,
const char *buf, bool exec)
{
task_lock(tsk);
trace_task_rename(tsk, buf);
- strscpy_pad(tsk->comm, buf, sizeof(tsk->comm));
+ strscpy_pad(tsk->real_comm, buf, sizeof(tsk->real_comm));
task_unlock(tsk);
perf_event_comm(tsk, exec);
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 61591ac6eab6..948220958548 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -299,6 +299,7 @@ struct user_event_mm;
*/
enum {
TASK_COMM_LEN = 16,
+ REAL_TASK_COMM_LEN = 24,
};

extern void sched_tick(void);
@@ -1090,7 +1091,10 @@ struct task_struct {
* - access it with [gs]et_task_comm()
* - lock it with task_lock()
*/
- char comm[TASK_COMM_LEN];
+ union {
+ char comm[TASK_COMM_LEN];
+ char real_comm[REAL_TASK_COMM_LEN];
+ };

struct nameidata *nameidata;

and the old common pattern of just printing with '%s' and tsk->comm
would just continue to work:

pr_alert("BUG: Bad page state in process %s pfn:%05lx\n",
current->comm, page_to_pfn(page));

but will get a longer max string.

Of course, we have code like this in security/selinux/selinuxfs.c that
is literally written so that it won't work:

if (new_value) {
char comm[sizeof(current->comm)];

memcpy(comm, current->comm, sizeof(comm));
pr_err("SELinux: %s (%d) set checkreqprot to 1. This
is no longer supported.\n",
comm, current->pid);
}

which copies to a temporary buffer (which now does *NOT* have a
closing NUL character), and then prints from that. The intent is to at
least have a stable buffer, but it basically relies on the source of
the memcpy() being stable enough anyway.

That said, a simple grep like this:

git grep 'memcpy.*->comm\>'

more than likely finds all relevant cases. Not *that* many, and just
changing the 'memcpy()' to 'copy_comm()' should fix them all.

The "copy_comm()" would trivially look something like this:

memcpy(dst, src, TASK_COMM_LEN);
dst[TASK_COMM_LEN-1] = 0;

and the people who want that old TASK_COMM_LEN behavior will get it,
and the people who just print out ->comm as a string will magically
get the longer new "real comm".

And people who do "sizeof(->comm)" will continue to get the old value
because of the hacky union. FWIW.

Anybody want to polish up the above turd? It doesn't look all that
hard unless I'm missing something, but needs some testing and care.

Linus