[PATCH v10 0/8] Use copy_process in vhost layer

From: Mike Christie
Date: Sun Jun 19 2022 - 21:15:01 EST


The following patches were made over Linus's tree.

Eric and Christian, the vhost maintainer, Michael Tsirkin has ACK'd the
patches. I haven't got any more comments from you guys for a couple
postings now (Jan 8 was the last reply). Are you guys ok to merge them?

For everyone else that hasn't see this before, the patches allow the
vhost layer to do a copy_process on the thread that does the
VHOST_SET_OWNER ioctl like how io_uring does a copy_process against its
userspace app. This allows the vhost layer's worker threads to inherit
cgroups, namespaces, address space, etc and this worker thread will also
be accounted for against that owner/parent process's RLIMIT_NPROC limit.

If you are not familiar with qemu and vhost here is more detailed
problem description:

Qemu will create vhost devices in the kernel which perform network, SCSI,
etc IO and management operations from worker threads created by the
kthread API. Because the kthread API does a copy_process on the kthreadd
thread, the vhost layer has to use kthread_use_mm to access the Qemu
thread's memory and cgroup_attach_task_all to add itself to the Qemu
thread's cgroups.

The problem with this approach is that we then have to add new functions/
args/functionality for every thing we want to inherit. I started doing
that here:

https://lkml.org/lkml/2021/6/23/1233

for the RLIMIT_NPROC check, but it seems it might be easier to just
inherit everything from the beginning, becuase I'd need to do something
like that patch several times.

V10:
- Eric's cleanup patches my vhost flush cleanup patches are merged
upstream, so rebase against Linus's tree which has everything.
V9:
- Rebase against Eric's kthread-cleanups-for-v5.19 branch. Drop patches
no longer needed due to kernel clone arg and pf io worker patches in that
branch.
V8:
- Fix kzalloc GFP use.
- Fix email subject version number.
V7:
- Drop generic user_worker_* helpers and replace with vhost_task specific
ones.
- Drop autoreap patch. Use kernel_wait4 instead.
- Fix issue where vhost.ko could be removed while the worker function is
still running.
V6:
- Rename kernel_worker to user_worker and fix prefixes.
- Add better patch descriptions.
V5:
- Handle kbuild errors by building patchset against current kernel that
has all deps merged. Also add patch to remove create_io_thread code as
it's not used anymore.
- Rebase patchset against current kernel and handle a new vm PF_IO_WORKER
case added in 5.16-rc1.
- Add PF_USER_WORKER flag so we can check it later after the initial
thread creation for the wake up, vm and singal cses.
- Added patch to auto reap the worker thread.
V4:
- Drop NO_SIG patch and replaced with Christian's SIG_IGN patch.
- Merged Christian's kernel_worker_flags_valid helpers into patch 5 that
added the new kernel worker functions.
- Fixed extra "i" issue.
- Added PF_USER_WORKER flag and added check that kernel_worker_start users
had that flag set. Also dropped patches that passed worker flags to
copy_thread and replaced with PF_USER_WORKER check.
V3:
- Add parentheses in p->flag and work_flags check in copy_thread.
- Fix check in arm/arm64 which was doing the reverse of other archs
where it did likely(!flags) instead of unlikely(flags).
V2:
- Rename kernel_copy_process to kernel_worker.
- Instead of exporting functions, make kernel_worker() a proper
function/API that does common work for the caller.
- Instead of adding new fields to kernel_clone_args for each option
make it flag based similar to CLONE_*.
- Drop unused completion struct in vhost.
- Fix compile warnings by merging vhost cgroup cleanup patch and
vhost conversion patch.