On Tue, Oct 16, 2018 at 03:01:11PM -0400, Pavel Tatashin wrote:
I gave it a run on an OpenPower (S812LC 8348-21C) with Power8 processor and
On 10/15/18 4:26 PM, Alexander Duyck wrote:
This change makes it so that we use the same approach that was already in
use on Sparc on all the archtectures that support a 64b long.
This is mostly motivated by the fact that 8 to 10 store/move instructions
are likely always going to be faster than having to call into a function
that is not specialized for handling page init.
An added advantage to doing it this way is that the compiler can get away
with combining writes in the __init_single_page call. As a result the
memset call will be reduced to only about 4 write operations, or at least
that is what I am seeing with GCC 6.2 as the flags, LRU poitners, and
count/mapcount seem to be cancelling out at least 4 of the 8 assignments on
my system.
One change I had to make to the function was to reduce the minimum page
size to 56 to support some powerpc64 configurations.
Signed-off-by: Alexander Duyck <alexander.h.duyck@xxxxxxxxxxxxxxx>
I have tested on Broadcom's Stingray cpu with 48G RAM:
__init_single_page() takes 19.30ns / 64-byte struct page
Wit the change it takes 17.33ns / 64-byte struct page
with 128G of RAM. My results for 64-byte struct page were:
before: 4.6788ns
after: 4.5882ns
My two cents :)