Re: [linus:master] [mm/ksm] 5e924ff54d: ltp.ksm01.fail

From: Stefan Roesch
Date: Fri Nov 17 2023 - 11:52:51 EST



David Hildenbrand <david@xxxxxxxxxx> writes:

> On 16.11.23 05:39, kernel test robot wrote:
>> hi, Stefan Roesch,
>> we reported
>> "[linux-next:master] [mm/ksm] 5e924ff54d: ltp.ksm01_1.fail"
>> in
>> https://lore.kernel.org/all/202311031548.66780ff5-oliver.sang@xxxxxxxxx/
>> when this commit is in linux-next/master.
>> now we noticed this commit is merged in mainline, and we still observed
>> same issue. just FYI.
>> Hello,
>> kernel test robot noticed "ltp.ksm01.fail" on:
>> commit: 5e924ff54d088828794d9f1a4d5bf17808f7270e ("mm/ksm: add "smart" page
>> scanning mode")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>> [test failed on linus/master 3ca112b71f35dd5d99fc4571a56b5fc6f0c15814]
>> [test failed on linux-next/master 8728c14129df7a6e29188a2e737b4774fb200953]
>> in testcase: ltp
>> version: ltp-x86_64-14c1f76-1_20230715
>> with following parameters:
>> disk: 1HDD
>> test: mm-00/ksm01
>> compiler: gcc-12
>> test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory
>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>> If you fix the issue in a separate patch/commit (i.e. not just a new version
>> of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
>> | Closes: https://lore.kernel.org/oe-lkp/202311161132.13d8ce5a-oliver.sang@xxxxxxxxx
>> Running tests.......
>> <<<test_start>>>
>> tag=ksm01 stime=1699563923
>> cmdline="ksm01"
>> contacts=""
>> analysis=exit
>> <<<test_output>>>
>> tst_kconfig.c:87: TINFO: Parsing kernel config '/proc/config.gz'
>> tst_test.c:1558: TINFO: Timeout per run is 0h 00m 30s
>> mem.c:422: TINFO: wait for all children to stop.
>> mem.c:388: TINFO: child 0 stops.
>> mem.c:388: TINFO: child 1 stops.
>> mem.c:388: TINFO: child 2 stops.
>> mem.c:495: TINFO: KSM merging...
>> mem.c:434: TINFO: resume all children.
>> mem.c:422: TINFO: wait for all children to stop.
>> mem.c:344: TINFO: child 1 continues...
>> mem.c:347: TINFO: child 1 allocates 128 MB filled with 'a'
>> mem.c:344: TINFO: child 2 continues...
>> mem.c:347: TINFO: child 2 allocates 128 MB filled with 'a'
>> mem.c:344: TINFO: child 0 continues...
>> mem.c:347: TINFO: child 0 allocates 128 MB filled with 'c'
>> mem.c:400: TINFO: child 1 stops.
>> mem.c:400: TINFO: child 0 stops.
>> mem.c:400: TINFO: child 2 stops.
>> ksm_helper.c:36: TINFO: ksm daemon takes 2s to run two full scans
>> mem.c:264: TINFO: check!
>> mem.c:255: TPASS: run is 1.
>> mem.c:255: TPASS: pages_shared is 2.
>> ....
>> mem.c:255: TPASS: pages_shared is 1.
>> mem.c:255: TPASS: pages_sharing is 98302.
>> mem.c:252: TFAIL: pages_volatile is not 0 but 1. <-----
>> mem.c:252: TFAIL: pages_unshared is not 1 but 0. <-----
>
> @Stefan, is this simply related to the new scanning optimization (skip and
> eventually not merge a pages within the "2 scans" windows, whereby previously,
> they would have gotten merged)?
>
> If so, we might just want to disable that optimization for that test case?
>
> Alternatively, maybe we have to wait for "more" scan cycles instead of only 2?

I'd expect this is caused by "smart scan", where we can skip pages.
The best is probably to disable the smart scan feature for this test.
The smart scan feature can be disabled by:

echo 0 > /sys/kernel/mm/ksm/smart_scan

I'll have a look at it today.