Re: [PATCH v2 1/2] mm:vmscan: the dirty folio in folio_list skip unmap

From: zhiguojiang
Date: Mon Oct 23 2023 - 22:04:41 EST

Next message: Li zeming: "[PATCH] power: snapshot: Optimize the error variable in the snapshot_write_next()"
Previous message: Chengming Zhou: "Re: [RFC PATCH v2 0/6] slub: Delay freezing of CPU partial slabs"
In reply to: Matthew Wilcox: "Re: [PATCH v2 1/2] mm:vmscan: the dirty folio in folio_list skip unmap"
Next in thread: zhiguojiang: "Re: [PATCH v2 1/2] mm:vmscan: the dirty folio in folio_list skip unmap"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

在 2023/10/23 21:01, Matthew Wilcox 写道:

On Mon, Oct 23, 2023 at 08:44:55PM +0800, zhiguojiang wrote:

在 2023/10/23 20:21, Matthew Wilcox 写道:

On Mon, Oct 23, 2023 at 04:07:28PM +0800, zhiguojiang wrote:

Are you seeing measurable changes for any workloads? It certainly seems
like you should, but it would help if you chose a test from mmtests and
showed how performance changed on your system.

In one mmtest, the max times for a invalid recyling of a folio_list dirty
folio that does not support pageout and has been activated in
shrink_folio_list() are: cost=51us, exe=2365us.

Calculate according to this formula: dirty_cost / total_cost * 100%, the
recyling efficiency of dirty folios can be improved 53.13%、82.95%.

So this patch can optimize shrink efficiency and reduce the workload of
kswapd to a certain extent.

kswapd0-96 ( 96) [005] ..... 387.218548:
mm_vmscan_lru_shrink_inactive: [Justin] nid 0 nr_scanned 32 nr_taken 32
nr_reclaimed 31 nr_dirty 1 nr_unqueued_dirty 1 nr_writeback 0
nr_activate[1] 1 nr_ref_keep 0 f RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
total_cost 96 total_exe 2365 dirty_cost 51 total_exe 2365

kswapd0-96 ( 96) [006] ..... 412.822532:
mm_vmscan_lru_shrink_inactive: [Justin] nid 0 nr_scanned 32 nr_taken 32
nr_reclaimed 0 nr_dirty 32 nr_unqueued_dirty 32 nr_writeback 0
nr_activate[1] 19 nr_ref_keep 13 f RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
total_cost 88 total_exe 605 dirty_cost 73 total_exe 605

I appreciate that you can put probes in and determine the cost, but do
you see improvements for a real workload? Like doing a kernel compile
-- does it speed up at all?

Can you help share a method for testing thread workload, like kswapd?

Something dirt simple like 'time make -j8'.

Two compilations were conducted separately, and compared to the unmodified compilation,
the compilation time for adding modified patches had a certain reduction, as follows:

Compilation command:
make distclean -j8
make ARCH=x86_64 x86_64_defconfig
time make -j8

1.Unmodified Compilation time:
real    2m40.276s
user    16m2.956s
sys     2m14.738s

real    2m40.136s
user    16m2.617s
sys     2m14.722s

2.[Patch v2 1/2] Modified Compilation time:
real    2m40.067s
user    16m3.164s
sys     2m14.211s

real    2m40.123s
user    16m2.439s
sys     2m14.508s

3 [Patch v2 1/2] + [Patch v2 2/2] Modified Compilation time:
real    2m40.367s
user    16m3.738s
sys     2m13.662s

real    2m40.014s
user    16m3.108s
sys     2m14.096s

Next message: Li zeming: "[PATCH] power: snapshot: Optimize the error variable in the snapshot_write_next()"
Previous message: Chengming Zhou: "Re: [RFC PATCH v2 0/6] slub: Delay freezing of CPU partial slabs"
In reply to: Matthew Wilcox: "Re: [PATCH v2 1/2] mm:vmscan: the dirty folio in folio_list skip unmap"
Next in thread: zhiguojiang: "Re: [PATCH v2 1/2] mm:vmscan: the dirty folio in folio_list skip unmap"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]