Re: [v3 0/9] parallelized "struct page" zeroing

From: Pasha Tatashin
Date: Fri May 12 2017 - 13:28:00 EST




On 05/12/2017 12:57 PM, David Miller wrote:
From: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx>
Date: Thu, 11 May 2017 16:59:33 -0400

We should either keep memset() only for deferred struct pages as what
I have in my patches.

Another option is to add a new function struct_page_clear() which
would default to memset() and to something else on platforms that
decide to optimize it.

On SPARC it would call STBIs, and we would do one membar call after
all "struct pages" are initialized.

No membars will be performed for single individual page struct clear,
the cutoff to use the STBI is larger than that.


Right now it is larger, but what I suggested is to add a new optimized routine just for this case, which would do STBI for 64-bytes but without membar (do membar at the end of memmap_init_zone() and deferred_init_memmap()

#define struct_page_clear(page) \
__asm__ __volatile__( \
"stxa %%g0, [%0]%2\n" \
"stxa %%xg0, [%0 + %1]%2\n" \
: /* No output */ \
: "r" (page), "r" (0x20), "i"(ASI_BLK_INIT_QUAD_LDD_P))

And insert it into __init_single_page() instead of memset()

The final result is 4.01s/T which is even faster compared to current 4.97s/T



Pasha