Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy

From: George Dunlap
Date: Wed Sep 23 2015 - 06:24:10 EST


On 09/23/2015 05:36 AM, Juergen Gross wrote:
> On 09/22/2015 06:22 PM, George Dunlap wrote:
>> On 09/22/2015 05:42 AM, Juergen Gross wrote:
>>> One other thing I just discovered: there are other consumers of the
>>> topology sibling masks (e.g. topology_sibling_cpumask()) as well.
>>>
>>> I think we would want to avoid any optimizations based on those in
>>> drivers as well, not only in the scheduler.
>>
>> I'm beginning to lose the thread of the discussion here a bit.
>>
>> Juergen / Dario, could one of you summarize your two approaches, and the
>> (alleged) advantages and disadvantages of each one?
>
> Okay, I'll have a try:
>
> The problem we want to solve:
> -----------------------------
>
> The Linux kernel is gathering cpu topology data during boot via the
> CPUID instruction on each processor coming online. This data is
> primarily used in the scheduler to decide to which cpu a thread should
> be migrated when this seems to be necessary. There are other users of
> the topology information in the kernel (e.g. some drivers try to do
> optimizations like core-specific queues/lists).
>
> When started in a virtualized environment the obtained data is next to
> useless or even wrong, as it is reflecting only the status of the time
> of booting the system. Scheduling of the (v)cpus done by the hypervisor
> is changing the topology beneath the feet of the Linux kernel without
> reflecting this in the gathered topology information. So any decisions
> taken based on that data will be clueless and possibly just wrong.
>
> The minimal solution is to change the topology data in the kernel in a
> way that all cpus are regarded as equal regarding their relation to each
> other (e.g. when migrating a thread to another cpu no cpu is preferred
> as a target).
>
> The topology information of the CPUID instruction is, however, even
> accessible form user mode and might be used for licensing purposes of
> any user program (e.g. by limiting the software to run on a specific
> number of cores or sockets). So just mangling the data returned by
> CPUID in the hypervisor seems not to be a general solution, while we
> might want to do it at least optionally in the future.
>
> In the future we might want to support either dynamic topology updates
> or be able to tell the kernel to use some of the topology data, e.g.
> when pinning vcpus.
>
>
> Solution 1 (Dario):
> -------------------
>
> Don't use the CPUID derived topology information in the Linux scheduler,
> but let it use a simple "flat" topology by setting own scheduler domain
> data under Xen.
>
> Advantages:
> + very clean solution regarding the scheduler interface
> + scheduler decisions are based on a minimal data set
> + small patch
>
> Disadvantages:
> - covers the scheduler only, drivers still use the "wrong" data
> - a little bit hacky regarding some NUMA architectures (needs either a
> hook in the code dealing with that architecture or multiple scheduler
> domain data overwrites)
> - future enhancements will make the solution less clean (either need
> duplicating scheduler domain data or some new hooks in scheduler
> domain interface)
>
>
> Solution 2 (Juergen):
> ---------------------
>
> When booted as a Xen guest modify the topology data built during boot
> resulting in the same simple "flat" topology as in Dario's solution.
>
> Advantages:
> + the simple topology is seen by all consumers of topology data as the
> data itself is modified accordingly
> + small patch
> + future enhancements rather easy by selecting which data to modify
>
> Disadvantages:
> - interface to scheduler not as clean as in Dario's approach
> - scheduler decisions are based on multiple layers of topology data
> where one layer would be enough to describe the topology
>
>
> Dario, are you okay with this summary?

Thanks -- that's very helpful.

-George
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/