Re: RFC: using worker threadpool to speed up clear_huge_page() by up to 5x

From: David Miller
Date: Thu Jul 28 2016 - 01:13:46 EST


From: kpusukur <kishore.kumar.pusukuri@xxxxxxxxxx>
Date: Sun, 17 Jul 2016 12:35:20 -0700

> We would welcome feedback and discussion of potential problems.
>
> We would also like to hear ideas for other areas in the kernel where a
> similar technique could be employed. For example, we've also applied
> this idea to copy on write operations for huge pages and it achieves
> around 20x speedup.

I don't know about this.

You can only profitably do this when you have enough physical cpu
resources schedulable, and on the same NUMA node.

By the time you compute the complete answer to that entire condition
you could have completed the hugepage clear.

Also, you should experiment with simply using a dedicated hugepage
clear assembler loop for these chips. It's really stupid to pay the
transaction cost of going in and out of the clear_user_highpage()
function N times per huge page.