Re: [PATCH v3] x86,sched: allow topologies where NUMA nodes share an LLC
From: Dave Hansen
Date: Thu Mar 29 2018 - 10:35:08 EST
On 03/29/2018 06:47 AM, Peter Zijlstra wrote:
> The issue is that HPC workloads care about cache-size-per-cpu measure,
> and the way they go about obtaining that is reading the cache-size and
> dividing it by the h-weight of the cache-mask.
That works, but only if the memory being accessed is slice/node-local.
If it's spread across the package, it'll be wrong.
But, the HPC folks are the ones that are the most likely to have good
NUMA affinity, so that would seem to point us in the direction of both
halving the size and the mask so that the LLC _looks_ split to userspace.
> Now the patch does in fact change the cache-mask as exposed to
> userspace, it however does _NOT_ change the cache-size. This means that
> anybody using the values from sysfs to compute size/weight, now gets
> double the value they ought to get.
>
> So either is must not change the llc-mask, or also change the llc-size.
IOW, don't make it look like we've either doubled or halved the exposed
size of the llc.
> Which then leads to the conclusion that the current:
>
>> + /* Do not use LLC for scheduler decisions: */
>> + return false;
>
> is wrong. Also, that comment is *completely* wrong, since the return
> value has *nothing* to do with scheduler decisions
OK, got it. That comment betrayed my ignorance. I'm glad we put it there.
What should we say, though?
/*
* false means 'c' does not share the LLC of 'o'.
* Note: this decision gets reflected all the way
* out to userspace
*/
return false;