Re: [v3 0/9] parallelized "struct page" zeroing
From: Pasha Tatashin
Date: Thu May 11 2017 - 16:47:40 EST
Have you measured that? I do not think it would be super hard to
measure. I would be quite surprised if this added much if anything at
all as the whole struct page should be in the cache line already. We do
set reference count and other struct members. Almost nobody should be
looking at our page at this time and stealing the cache line. On the
other hand a large memcpy will basically wipe everything away from the
cpu cache. Or am I missing something?
Here is data for single thread (deferred struct page init is disabled):
Intel CPU E7-8895 v3 @ 2.60GHz 1T memory
-----------------------------------------
time to memset "struct pages in memblock: 11.28s
time to init "struct pag"es: 4.90s
Moving memset into __init_single_page()
time to init and memset "struct page"es: 8.39s
SPARC M6 @ 3600 MHz 1T memory
-----------------------------------------
time to memset "struct pages in memblock: 1.60s
time to init "struct pag"es: 3.37s
Moving memset into __init_single_page()
time to init and memset "struct page"es: 12.99s
So, moving memset() into __init_single_page() benefits Intel. I am
actually surprised why memset() is so slow on intel when it is called
from memblock. But, hurts SPARC, I guess these membars at the end of
memset() kills the performance.
Also, when looking at these values, remeber that Intel has twice as many
"struct page" for the same amount of memory.
Pasha