Re: [RFC v2 0/5] cgroup-aware unbound workqueues

From: Tejun Heo
Date: Tue Jun 11 2019 - 16:00:26 EST


Hello, Daniel.

On Wed, Jun 05, 2019 at 11:32:29AM -0400, Daniel Jordan wrote:
> Sure, quoting from the last ktask post:
>
> A single CPU can spend an excessive amount of time in the kernel operating
> on large amounts of data. Often these situations arise during initialization-
> and destruction-related tasks, where the data involved scales with system size.
> These long-running jobs can slow startup and shutdown of applications and the
> system itself while extra CPUs sit idle.
>
> To ensure that applications and the kernel continue to perform well as core
> counts and memory sizes increase, harness these idle CPUs to complete such jobs
> more quickly.
>
> ktask is a generic framework for parallelizing CPU-intensive work in the
> kernel. The API is generic enough to add concurrency to many different kinds
> of tasks--for example, zeroing a range of pages or evicting a list of
> inodes--and aims to save its clients the trouble of splitting up the work,
> choosing the number of threads to use, maintaining an efficient concurrency
> level, starting these threads, and load balancing the work between them.

Yeah, that rings a bell.

> > For memory and io, we're generally going for remote charging, where a
> > kthread explicitly says who the specific io or allocation is for,
> > combined with selective back-charging, where the resource is charged
> > and consumed unconditionally even if that would put the usage above
> > the current limits temporarily. From what I've been seeing recently,
> > combination of the two give us really good control quality without
> > being too invasive across the stack.
>
> Yes, for memory I actually use remote charging. In patch 3 the worker's
> current->active_memcg field is changed to match that of the cgroup associated
> with the work.

I see.

> > CPU doesn't have a backcharging mechanism yet and depending on the use
> > case, we *might* need to put kthreads in different cgroups. However,
> > such use cases might not be that abundant and there may be gotaches
> > which require them to be force-executed and back-charged (e.g. fs
> > compression from global reclaim).
>
> The CPU-intensiveness of these works is one of the reasons for actually putting
> the workers through the migration path. I don't know of a way to get the
> workers to respect the cpu controller (and even cpuset for that matter) without
> doing that.

So, I still think it'd likely be better to go back-charging route than
actually putting kworkers in non-root cgroups. That's gonna be way
cheaper, simpler and makes avoiding inadvertent priority inversions
trivial.

Thanks.

--
tejun