Re: Unexpected EINVAL when enabling cpuset in subtree_control when io_uring threads are running

From: Waiman Long
Date: Wed Mar 08 2023 - 09:21:06 EST


On 3/8/23 06:42, Daniel Dao wrote:
Hi all,

We encountered EINVAL when enabling cpuset in cgroupv2 when io_uring
worker threads are running. Here are the steps to reproduce the failure
on kernel 6.1.14:

1. Remove cpuset from subtree_control

> for d in $(find /sys/fs/cgroup/ -maxdepth 1 -type d); do echo
'-cpuset' | sudo tee -a $d/cgroup.subtree_control; done
> cat /sys/fs/cgroup/cgroup.subtree_control
cpu io memory pids

2. Run any applications that utilize the uring worker thread pool. I used
https://github.com/cloudflare/cloudflare-blog/tree/master/2022-02-io_uring-worker-pool

> cargo run -- -a -w 2 -t 2

3. Enabling cpuset will return EINVAL

> echo '+cpuset' | sudo tee -a /sys/fs/cgroup/cgroup.subtree_control
+cpuset
tee: /sys/fs/cgroup/cgroup.subtree_control: Invalid argument

We traced this down to task_can_attach that will return EINVAL when it
encounters
kthreads with PF_NO_SETAFFINITY, which io_uring worker threads have.

This seems like an unexpected interaction when enabling cpuset for the subtrees
that contain kthreads. We are currently considering a workaround to try to
enable cpuset in root subtree_control before any io_uring applications
can start,
hence failure to enable cpuset is localized to only cgroup with
io_uring kthreads.
But this is cumbersome.

Any suggestions would be very much appreciated.

Anytime you echo "+cpuset" to cgroup.subtree_control to enable cpuset, the tasks within the child cgroups will do an implicit move from the parent cpuset to the child cpusets. However, that move will fail if any task has the PF_NO_SETAFFINITY flag set due to task_can_attach() function which checks for this. One possible solution is for the cpuset to ignore tasks with PF_NO_SETAFFINITY set for implicit move. IOW, allowing the implicit move without touching it, but not explicit one using cgroup.procs.

Cheers,
Longman