Re: [PATCH] cache align vm_stat
From: Dimitri Sivanich
Date: Thu Oct 27 2011 - 08:51:19 EST
On Thu, Oct 27, 2011 at 10:50:25AM +0200, Mel Gorman wrote:
> On Mon, Oct 24, 2011 at 11:10:35AM -0500, Dimitri Sivanich wrote:
> > Avoid false sharing of the vm_stat array.
> >
> > This was found to adversely affect tmpfs I/O performance.
> >
>
> I think this fix is overly simplistic. It is moving each counter into
> its own cache line. While I accept that this will help the preformance
> of the tmpfs-based workload, it will adversely affect workloads that
> touch a lot of counters because of the increased cache footprint.
To reiterate, this change is as follows:
-atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
+atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS] __cacheline_aligned_in_smp;
I think you've misunderstood the effect of the fix. It does not move
each counter into it's own cache line. It forces the overall array to
start on a cache line boundary. From 'nm -n vmlinux' output:
ffffffff81e04200 d hash_lock
ffffffff81e04240 D vm_stat
ffffffff81e04340 d nr_files
Unless I'm missing something, vm_stat is using 4 cachelines, not 24.
And yes, the addresses of the first 3 counters are, in fact,
ffffffff81e04240, ffffffff81e04248, and ffffffff81e04250.
Also, while this patch does make some difference in performance, it is
only an incremental step in improving vm_stat contention. More will need
to be done.
Also, I've found that separating counters into their own cacheline has less
of an effect on performance than one might think. It really has more to do
with the number of updates that each of these counters (especially
NR_FREE_PAGES) is receiving.
>
> 1. Is it possible to rearrange the vmstat array such that two hot
> counters do not share a cache line?
See my comment above.
> 2. Has Andrew's suggestion to alter the per-cpu threshold based on the
> value of the global counter to reduce conflicts been tried?
Altering the per-cpu threshold (that returned from calculate*threshold in
mm/vmstat.c) works well, but see my previous posts about the size of
threshold required to achieve reasonable scaling performance. Values are
much higher than the currently allowed 125 (more on the order of 1000-2000).
>
> (I'm at Linux Con at the moment so will be even slower to respond than
> usual)
>
> --
> Mel Gorman
> SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/