Re: [PATCH RFC 0/3] optimize kswapd when it does reclaim for hugepage

From: hejianet
Date: Tue Jan 24 2017 - 21:14:31 EST


Hi Michal
Thanks for the comments, I will resend the patch as per your
comment after my 2 weeks vacation.

B.R.
Jia

On 25/01/2017 12:46 AM, Michal Hocko wrote:
On Tue 24-01-17 15:49:01, Jia He wrote:
If there is a server with uneven numa memory layout:
available: 7 nodes (0-6)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 6603 MB
node 0 free: 91 MB
node 1 cpus:
node 1 size: 12527 MB
node 1 free: 157 MB
node 2 cpus:
node 2 size: 15087 MB
node 2 free: 189 MB
node 3 cpus:
node 3 size: 16111 MB
node 3 free: 205 MB
node 4 cpus: 8 9 10 11 12 13 14 15
node 4 size: 24815 MB
node 4 free: 310 MB
node 5 cpus:
node 5 size: 4095 MB
node 5 free: 61 MB
node 6 cpus:
node 6 size: 22750 MB
node 6 free: 283 MB
node distances:
node 0 1 2 3 4 5 6
0: 10 20 40 40 40 40 40
1: 20 10 40 40 40 40 40
2: 40 40 10 20 40 40 40
3: 40 40 20 10 40 40 40
4: 40 40 40 40 10 20 40
5: 40 40 40 40 20 10 40
6: 40 40 40 40 40 40 10

In this case node 5 has less memory and we will alloc the hugepages
from these nodes one by one after we trigger
echo 4000 > /proc/sys/vm/nr_hugepages

Then the kswapd5 will take 100% cpu for a long time. This is a livelock
issue in kswapd. This patch set fixes it.

It would be really helpful to describe what is the issue and whether it
is specific to the configuration above. Also a highlevel overview of the
fix and why it is the right approach would be appreciated.

The 3rd patch improves the kswapd's bad performance significantly.

Numbers?

Jia He (3):
mm/hugetlb: split alloc_fresh_huge_page_node into fast and slow path
mm, vmscan: limit kswapd loop if no progress is made
mm, vmscan: correct prepare_kswapd_sleep return value

mm/hugetlb.c | 9 +++++++++
mm/vmscan.c | 28 ++++++++++++++++++++++++----
2 files changed, 33 insertions(+), 4 deletions(-)

--
2.5.5