Re: [PATCH] mm: make fault_around_bytes configurable

From: Andrew Morton
Date: Thu Apr 21 2016 - 20:01:58 EST


On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@xxxxxxxxxxxxxx> wrote:

> Mapping pages around fault is found to cause performance degradation
> in certain use cases. The test performed here is launch of 10 apps
> one by one, doing something with the app each time, and then repeating
> the same sequence once more, on an ARM 64-bit Android device with 2GB
> of RAM. The time taken to launch the apps is found to be better when
> fault around feature is disabled by setting fault_around_bytes to page
> size (4096 in this case).

Well that's one workload, and a somewhat strange one. What is the
effect on other workloads (of which there are a lot!).

> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> for debugging. pgpgoutclean accounts the clean pages reclaimed via
> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> and pageref_keep accounts the mapped file pages activated and retained
> by page_check_references.
>
> === Without swap ===
> 3.18 3.18-fault_around_bytes=4096
> -----------------------------------------------------------------------
> workingset_refault 691100 664339
> workingset_activate 210379 179139
> pgpgin 4676096 4492780
> pgpgout 163967 96711
> pgpgoutclean 1090664 990659
> pgalloc_dma 3463111 3328299
> pgfree 3502365 3363866
> pgactivate 568134 238570
> pgdeactivate 752260 392138
> pageref_activate 315078 121705
> pageref_activate_vm_exec 162940 55815
> pageref_keep 141354 51011
> pgmajfault 24863 23633
> pgrefill_dma 1116370 544042
> pgscan_kswapd_dma 1735186 1234622
> pgsteal_kswapd_dma 1121769 1005725
> pgscan_direct_dma 12966 1090
> pgsteal_direct_dma 6209 967
> slabs_scanned 1539849 977351
> pageoutrun 1260 1333
> allocstall 47 7
>
> === With swap ===
> 3.18 3.18-fault_around_bytes=4096
> -----------------------------------------------------------------------
> workingset_refault 597687 878109
> workingset_activate 167169 254037
> pgpgin 4035424 5157348
> pgpgout 162151 85231
> pgpgoutclean 928587 1225029
> pswpin 46033 17100
> pswpout 237952 127686
> pgalloc_dma 3305034 3542614
> pgfree 3354989 3592132
> pgactivate 626468 355275
> pgdeactivate 990205 771902
> pageref_activate 294780 157106
> pageref_activate_vm_exec 141722 63469
> pageref_keep 121931 63028
> pgmajfault 67818 45643
> pgrefill_dma 1324023 977192
> pgscan_kswapd_dma 1825267 1720322
> pgsteal_kswapd_dma 1181882 1365500
> pgscan_direct_dma 41957 9622
> pgsteal_direct_dma 25136 6759
> slabs_scanned 689575 542705
> pageoutrun 1234 1538
> allocstall 110 26
>
> Looks like with fault_around, there is more pressure on reclaim because
> of the presence of more mapped pages, resulting in more IO activity,
> more faults, more swapping, and allocstalls.

A few of those things did get a bit worse?

Do you have any data on actual wall-time changes? How much faster do
things become with the patch? If it is "0.1%" then I'd say "umm, no".

> Make fault_around_bytes configurable so that it can be tuned to avoid
> performance degradation.

It sounds like we need to be smarter about auto-tuning this thing.
Maybe the refault code could be taught to provide the feedback path but
that sounds hard.

Still. I do think it would be better to make this configurable at
runtime. Move the existing debugfs tunable into /proc/sys/vm (and
document it!). I do dislkie adding even more tunables but this one
does make sense. People will want to run their workloads with various
values until they find the peak throughput, and requiring a kernel
rebuild for that is a huge pain.