Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

From: Oleg Nesterov
Date: Mon May 29 2023 - 13:48:08 EST


Mike, sorry, I don't understand your email.

Just in case, let me remind I know nothing about drivers/vhost/

On 05/29, michael.christie@xxxxxxxxxx wrote:
>
> On 5/29/23 6:19 AM, Oleg Nesterov wrote:
> > On 05/27, Eric W. Biederman wrote:
> >>
> >> Looking forward I don't see not asking the worker threads to stop
> >> for the coredump right now causing any problems in the future.
> >> So I think we can use this to resolve the coredump issue I spotted.
> >
> > But we have almost the same problem with exec.
> >
> > Execing thread will wait for vhost_worker() while vhost_worker will wait for
> > .release -> vhost_task_stop().
>
> For this type of case, what is the goal or correct behavior in the end?
>
> When get_signal returns true we can code things like you mention below

and you have mentioned in the next email that you have already coded something
like this, so perhaps we can delay the further discussions until you send the
new code?

> and
> clean up the task_struct.

Hmm... If we you CLONE_THREAD the exiting vhost_worker() will auto-reap itself,

> However, we now have a non-functioning vhost device
> open and just sitting around taking up memory and it can't do any IO.

can't comment, see above.

> For this type of case, do we expect just not to crash/hang, or was this new
> exec'd thread suppose to be able to use the vhost device?

I just tried to point out that (unless I missed something) there are more corner
cases, not just coredump.

> > Or suppose that vhost_worker's sub-thread forks a child with CLONE_FILES...
>
> You mean the vhost_task's task/thread doing a function that does a copy_process
> right?

I meant that the vhost_task's sub-thread can do sys_clone(CLONE_FILES) from
userspace. Yes, this implies copy_process() but I still can't understand you.

> That type of thing is not needed.

Do you mean that userspace should never do this? But this doesn't matter, the
kernel should handle this case anyway.

Or what?

In short let me repeat that I don't understand you and - of course! - quite
possibly I missed something.

Oleg.