Re: [GIT PULL] KVM changes for Linux 6.14
From: Paolo Bonzini
Date: Tue Feb 04 2025 - 11:05:33 EST
On Tue, Feb 4, 2025 at 3:19 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
>
> On Mon, Jan 27, 2025 at 04:15:01PM +0100, Paolo Bonzini wrote:
> > On Mon, Jan 27, 2025 at 3:10 PM Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> > > On 01/26, Linus Torvalds wrote:
> > > > On Sun, 26 Jan 2025 at 10:54, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> > > > >
> > > > > I don't think we even need to detect the /proc/self/ or /proc/self-thread/
> > > > > case, next_tid() can just check same_thread_group,
> > > >
> > > > That was my thinking yes.
> > > >
> > > > If we exclude them from /proc/*/task entirely, I'd worry that it would
> > > > hide it from some management tool and be used for nefarious purposes
> > >
> > > Agreed,
> > >
> > > > (even if they then show up elsewhere that the tool wouldn't look at).
> > >
> > > Even if we move them from /proc/*/task to /proc ?
> >
> > Indeed---as long as they show up somewhere, it's not worse than it
> > used to be. The reason why I'd prefer them to stay in /proc/*/task is
> > that moving them away at least partly negates the benefits of the
> > "workers are children of the starter" model. For example it
> > complicates measuring their cost within the process that runs the VM.
> > Maybe it's more of a romantic thing than a real practical issue,
> > because in the real world resource accounting for VMs is done via
> > cgroups. But unlike the lazy creation in KVM, which is overall pretty
> > self-contained, I am afraid the ugliness in procfs would be much worse
> > compared to the benefit, if there's a benefit at all.
> >
> > > Perhaps, I honestly do not know what will/can confuse userspace more.
> >
> > At the very least, marking workers as "Kthread: 1" makes sense and
> > should not cause too much confusion. I wouldn't go beyond that unless
> > we get more reports of similar issues, and I'm not even sure how
> > common it is for userspace libraries to check for single-threadedness.
>
> Sorry, just saw this thread now.
>
> What if we did what Linus suggests and hide (odd) user workers from
> /proc/<pid>/task/* but also added /proc/<pid>/workers/*. The latter
> would only list the workers that got spawned by the kernel for that
> particular task? This would acknowledge their somewhat special status
> and allow userspace to still detect them as "Hey, I didn't actually
> spawn those, they got shoved into my workload by the kernel for me.".
Wouldn't the workers then disappear completely from ps, top or other
tools that look at /proc/$PID/task? That seems a bit too underhanded
towards userspace...
Paolo
> (Another (ugly) alternative would be to abuse prctl() and have workloads
> opt-in to hiding user workers from /proc/<pid>/task/.)