Re: [RFC PATCH 00/18] kthreads/signal: Safer kthread API and signal handling

From: Peter Zijlstra
Date: Fri Jun 05 2015 - 12:22:40 EST


On Fri, Jun 05, 2015 at 05:00:59PM +0200, Petr Mladek wrote:
> Workqueue
>
>
> Workqueues are quite popular and many kthreads have already been
> converted into them.
>
> Work queues allow to split the function into even more pieces and
> reach the common check point more often. It is especially useful
> when a kthread handles more tasks and is woken when some work
> is needed. Then we could queue the appropriate work instead
> of waking the whole kthread and checking what exactly needs
> to be done.
>
> But there are many kthreads that need to cycle many times
> until some work is finished, e.g. khugepaged, virtio_balloon,
> jffs2_garbage_collect_thread. They would need to queue the
> work item repeatedly from the same work item or between
> more work items. It would be a strange semantic.
>
> Work queues allow to share the same kthread between more users.
> It helps to reduce the number of running kthreads. It is especially
> useful if you would need a kthread for each CPU.
>
> But this might also be a disadvantage. Just look into the output
> of the command "ps" and see the many [kworker*] processes. One
> might see this a black hole. If a kworker makes the system busy,
> it is less obvious what the problem is in compare with the old
> "simple" and dedicated kthreads.
>
> Yes, we could add some debugging tools for work queues but
> it would be another non-standard thing that developers and
> system administrators would need to understand.
>
> Another thing is that work queues have their own scheduler. If we
> move even more tasks there it might need even more love. Anyway,
> the extra scheduler adds another level of complexity when
> debugging problems.

There's a lot more problems with workqueues:

- they're not regular tasks and all the task controls don't work on
them. This means all things scheduler, like cpu-affinity, nice, and
RT/deadline scheduling policies. Instead there is some half baked
secondary interface for some of these.

But this also very much includes things like cgroups, which brings me
to the second point.

- its oblivious to cgroups (as it is to RT priority for example) both
leading to priority inversion. A work enqueued from a deep/limited
cgroup does not inherit the task's cgroup. Instead this work is ran
from the root cgroup.

This breaks cgroup isolation, more significantly so when a large part
of the actual work is done from workqueues (as some workloads end up
being). Instead of being able to control the work, it all ends up in
the root cgroup outside of control.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/