Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

From: Mike Christie
Date: Mon May 29 2023 - 15:35:56 EST

On 5/29/23 12:46 PM, Oleg Nesterov wrote:
> Mike, sorry, I don't understand your email.
> Just in case, let me remind I know nothing about drivers/vhost/

No problem. I get it. I don't know the signal/thread code so it's
one of those things where I'm also learning as I go.

> On 05/29, michael.christie@xxxxxxxxxx wrote:
>> On 5/29/23 6:19 AM, Oleg Nesterov wrote:
>>> On 05/27, Eric W. Biederman wrote:
>>>> Looking forward I don't see not asking the worker threads to stop
>>>> for the coredump right now causing any problems in the future.
>>>> So I think we can use this to resolve the coredump issue I spotted.
>>> But we have almost the same problem with exec.
>>> Execing thread will wait for vhost_worker() while vhost_worker will wait for
>>> .release -> vhost_task_stop().
>> For this type of case, what is the goal or correct behavior in the end?
>> When get_signal returns true we can code things like you mention below
> and you have mentioned in the next email that you have already coded something
> like this, so perhaps we can delay the further discussions until you send the
> new code?

Ok. Let me post that. You guys and the vhost devs can argue about if it's
too ugly to merge :)

>> and
>> clean up the task_struct.
> Hmm... If we you CLONE_THREAD the exiting vhost_worker() will auto-reap itself,

Oh wait, are you saying that when we get auto-reaped then we would do the last
fput and call the file_operations->release function right? We actually set
task_struct->files = NULL for the vhost_task task_struct, so I think we call
release a little sooner than you think.

vhost_task_create() sets kernel_clone_args->no_files, so the vhost_task task_struc
that gets created works like kthreads where it doesn't do a CLONE_FILES and it
doesn't do a dup_fd.

So when we do de_thread() -> zap_other_threads(), that will kill all the threads
in the group right? So when they exit, it will call our release function since
we don't have refcount on ourself.

>> However, we now have a non-functioning vhost device
>> open and just sitting around taking up memory and it can't do any IO.
> can't comment, see above.
>> For this type of case, do we expect just not to crash/hang, or was this new
>> exec'd thread suppose to be able to use the vhost device?
> I just tried to point out that (unless I missed something) there are more corner
> cases, not just coredump.

Ok. I see.

>>> Or suppose that vhost_worker's sub-thread forks a child with CLONE_FILES...
>> You mean the vhost_task's task/thread doing a function that does a copy_process
>> right?
> I meant that the vhost_task's sub-thread can do sys_clone(CLONE_FILES) from
> userspace.

I think we were talking about different things. I was saying that when the vhost
layer does vhost_task_create() -> copy_process(), the kernel_clone_args.fn is
vhost_task_fn() -> vhost_worker(). I thought you were concerned about vhost_worker()
or some function it calls, calling copy_process() with CLONE_FILES.