Re: [PATCH workqueue/for-6.10-fixes] workqueue: Refactor worker ID formatting and make wq_worker_comm() use full ID string

From: Yafang Shao
Date: Thu May 23 2024 - 09:04:25 EST


On Thu, May 23, 2024 at 12:32 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, 22 May 2024 at 19:38, Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> >
> > Indeed, the 16-byte limit is hard-coded in certain BPF code:
>
> It's worse than that.
>
> We have code like this:
>
> memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
>
> and it looks like this code not only has a fixed-size target buffer of
> TASK_COMM_LEN, it also just uses "memcpy()" instead of "strscpy()",
> knowing that the source has the NUL byte in it.
>
> If it wasn't for that memcpy() pattern, I think this trivial patch
> would "JustWork(tm)"
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 2d7dd0e39034..5829912a2fa0 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1239,7 +1239,7 @@ char *__get_task_comm(char *buf, size_t
> buf_size, struct task_struct *tsk)
> {
> task_lock(tsk);
> /* Always NUL terminated and zero-padded */
> - strscpy_pad(buf, tsk->comm, buf_size);
> + strscpy_pad(buf, tsk->real_comm, buf_size);
> task_unlock(tsk);
> return buf;
> }
> @@ -1254,7 +1254,7 @@ void __set_task_comm(struct task_struct *tsk,
> const char *buf, bool exec)
> {
> task_lock(tsk);
> trace_task_rename(tsk, buf);
> - strscpy_pad(tsk->comm, buf, sizeof(tsk->comm));
> + strscpy_pad(tsk->real_comm, buf, sizeof(tsk->real_comm));
> task_unlock(tsk);
> perf_event_comm(tsk, exec);
> }
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 61591ac6eab6..948220958548 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -299,6 +299,7 @@ struct user_event_mm;
> */
> enum {
> TASK_COMM_LEN = 16,
> + REAL_TASK_COMM_LEN = 24,
> };
>
> extern void sched_tick(void);
> @@ -1090,7 +1091,10 @@ struct task_struct {
> * - access it with [gs]et_task_comm()
> * - lock it with task_lock()
> */
> - char comm[TASK_COMM_LEN];
> + union {
> + char comm[TASK_COMM_LEN];
> + char real_comm[REAL_TASK_COMM_LEN];
> + };
>
> struct nameidata *nameidata;
>
> and the old common pattern of just printing with '%s' and tsk->comm
> would just continue to work:
>
> pr_alert("BUG: Bad page state in process %s pfn:%05lx\n",
> current->comm, page_to_pfn(page));
>
> but will get a longer max string.
>
> Of course, we have code like this in security/selinux/selinuxfs.c that
> is literally written so that it won't work:
>
> if (new_value) {
> char comm[sizeof(current->comm)];
>
> memcpy(comm, current->comm, sizeof(comm));
> pr_err("SELinux: %s (%d) set checkreqprot to 1. This
> is no longer supported.\n",
> comm, current->pid);
> }
>
> which copies to a temporary buffer (which now does *NOT* have a
> closing NUL character), and then prints from that. The intent is to at
> least have a stable buffer, but it basically relies on the source of
> the memcpy() being stable enough anyway.
>
> That said, a simple grep like this:
>
> git grep 'memcpy.*->comm\>'
>
> more than likely finds all relevant cases. Not *that* many, and just
> changing the 'memcpy()' to 'copy_comm()' should fix them all.
>
> The "copy_comm()" would trivially look something like this:
>
> memcpy(dst, src, TASK_COMM_LEN);
> dst[TASK_COMM_LEN-1] = 0;
>
> and the people who want that old TASK_COMM_LEN behavior will get it,
> and the people who just print out ->comm as a string will magically
> get the longer new "real comm".
>
> And people who do "sizeof(->comm)" will continue to get the old value
> because of the hacky union. FWIW.
>
> Anybody want to polish up the above turd? It doesn't look all that
> hard unless I'm missing something, but needs some testing and care.

If it's not urgent and no one else will handle it, I'll take care of
it. However, I might not be able to complete it quickly.

--
Regards
Yafang