Re: [PATCH 1/2] mm: NUMA stats code cleanup and enhancement

From: Vlastimil Babka
Date: Tue Nov 28 2017 - 17:54:29 EST


On 11/28/2017 07:40 PM, Andi Kleen wrote:
> Vlastimil Babka <vbabka@xxxxxxx> writes:
>>
>> I'm worried about the "for_each_possible..." approach here and elsewhere
>> in the patch as it can be rather excessive compared to the online number
>> of cpus (we've seen BIOSes report large numbers of possible CPU's). IIRC
>
> Even if they report a few hundred extra reading some more shared cache lines
> is very cheap. The prefetcher usually quickly figures out such a pattern
> and reads it all in parallel.

Hmm, prefetcher AFAIK works within page bounday and here IIUC we are
iterating between pcpu areas in the inner loop, which are futher apart
than that? And their number may exhausts the simultaneous prefetch
stream. And the outer loops repeats that for each counter. We might be
either evicting quite a bit of cache, or perhaps the distance between
pcpu areas is such that it will cause collision misses, so we'll be
always cache cold and not even benefit from multiple counters fitting
into single cache line.

> I doubt it will be noticeable, especially not in a slow path
> like reading something from proc/sys.
>
>> the general approach with vmstat is to query just online cpu's / nodes,
>> and if they go offline, transfer their accumulated stats to some other
>> "victim"?
>
> That's very complicated, and unlikely to be worth it.

vm_events_fold_cpu() doesn't look that complicated

>
> -Andi
>