Re: [PATCH 3/3] sched: Implement interface for cgroup unified hierarchy

From: Paul Turner
Date: Mon Aug 24 2015 - 19:16:36 EST


On Mon, Aug 24, 2015 at 3:49 PM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> Hello,
>
> On Mon, Aug 24, 2015 at 03:03:05PM -0700, Paul Turner wrote:
>> > Hmm... I was hoping for an actual configurations and usage scenarios.
>> > Preferably something people can set up and play with.
>>
>> This is much easier to set up and play with synthetically. Just
>> create the 10 threads and 100 threads above then experiment with
>> configurations designed at guaranteeing the set of 100 threads
>> relatively uniform throughput regardless of how many are active. I
>> don't think trying to run a VM stack adds anything except complexity
>> of reproduction here.
>
> Well, but that loses most of details and why such use cases matter to
> begin with. We can imagine up stuff to induce arbitrary set of
> requirements.

All that's being proved or disproved here is that it's difficult to
coordinate the consumption of asymmetric thread pools using nice. The
constraints here were drawn from a real-world example.

>
>> > I take that the
>> > CPU intensive helper threads are usually IO workers? Is the scenario
>> > where the VM is set up with a lot of IO devices and different ones may
>> > consume large amount of CPU cycles at any given point?
>>
>> Yes, generally speaking there are a few major classes of IO (flash,
>> disk, network) that a guest may invoke. Each of these backends is
>> separate and chooses its own threading.
>
> Hmmm... if that's the case, would limiting iops on those IO devices
> (or classes of them) work? qemu already implements IO limit mechanism
> after all.

No.

1) They should proceed at the maximum rate that they can that's still
within their provisioning budget.
2) The cost/IO is both inconsistent and changes over time. Attempting
to micro-optimize every backend for this is infeasible, this is
exactly the type of problem that the scheduler can usefully help
arbitrate.
3) Even pretending (2) is fixable, dynamically dividing these
right-to-work tokens between different I/O device backends is
extremely complex.

>
> Anyways, a point here is that threads of the same process competing
> isn't a new problem. There are many ways to make those threads play
> nice as the application itself often has to be involved anyway,
> especially for something like qemu which is heavily involved in
> provisioning resources.

It's certainly not a new problem, but it's a real one, and it's
_hard_. You're proposing removing the best known solution.

>
> cgroups can be a nice brute-force add-on which lets sysadmins do wild
> things but it's inherently hacky and incomplete for coordinating
> threads. For example, what is it gonna do if qemu cloned vcpus and IO
> helpers dynamically off of the same parent thread?

We're talking about sub-process usage here. This is the application
coordinating itself, NOT the sysadmin. Processes are becoming larger
and larger, we need many of the same controls within them that we have
between them.

> It requires
> application's cooperation anyway but at the same time is painful to
> actually interact from those applications.

As discussed elsewhere on thread this is really not a problem if you
define consistent rules with respect to which parts are managed by
who. The argument of potential interference is no different to
messing with an application's on-disk configuration behind its back.
Alternate strawmen which greatly improve this from where we are today
have also been proposed.

>
> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/