Re: [PATCH v10 6/8] x86/clear_page: Introduce clear_pages()
From: David Hildenbrand (Red Hat)
Date: Thu Dec 18 2025 - 02:25:02 EST
On 12/15/25 21:49, Ankur Arora wrote:
Performance when clearing with string instructions (x86-64-stosq and
similar) can vary significantly based on the chunk-size used.
$ perf bench mem memset -k 4KB -s 4GB -f x86-64-stosq
# Running 'mem/memset' benchmark:
# function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S)
# Copying 4GB bytes ...
13.748208 GB/sec
$ perf bench mem memset -k 2MB -s 4GB -f x86-64-stosq
# Running 'mem/memset' benchmark:
# function 'x86-64-stosq' (movsq-based memset() in
# arch/x86/lib/memset_64.S)
# Copying 4GB bytes ...
15.067900 GB/sec
$ perf bench mem memset -k 1GB -s 4GB -f x86-64-stosq
# Running 'mem/memset' benchmark:
# function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S)
# Copying 4GB bytes ...
38.104311 GB/sec
(Both on AMD Milan.)
With a change in chunk-size from 4KB to 1GB, we see the performance go
from 13.7 GB/sec to 38.1 GB/sec. For the chunk-size of 2MB the change isn't
quite as drastic but it is worth adding a clear_page() variant that can
handle contiguous page-extents.
Signed-off-by: Ankur Arora <ankur.a.arora@xxxxxxxxxx>
Tested-by: Raghavendra K T <raghavendra.kt@xxxxxxx>
Nothing jumped at me.
Reviewed-by: David Hildenbrand (Red Hat) <david@xxxxxxxxxx>
--
Cheers
David