Re: Recurring warning in page_copy_sane (inside copy_page_to_iter) when running stress tests involving drop_caches

From: Lech Perczak
Date: Wed May 15 2019 - 09:20:52 EST


Hi,

W dniu 26.04.2019 oÂ10:42, Lech Perczak pisze:
> Cc linux-mm@xxxxxxxxx
>
> W dniu 25.04.2019 oÂ11:25, Lech Perczak pisze:
>> Hello,
>>
>> Some time ago, after upgrading the Kernel on our i.MX6Q-based boards to mainline 4.18, and now to LTS 4.19 line, during stress tests we started noticing strange warnings coming from 'read' syscall, when page_copy_sane() check failed. Typical reproducibility is up to ~4 events per 24h. Warnings origin from different processes, mostly involved with the stress tests, but not necessarily with block devices we're stressing. If the warning appeared in process relating to block device stress test, it would be accompanied by corrupted data, as the read operation gets aborted.
>> Our reason for dropping caches is to ensure that the actual block device gets accessed during the test without resorting to O_DIRECT.
>> When dropping caches was disabled in the tests, the issue would also disappear, at least in a single-week run.
>>
>> Example processes caught in the last run (grepped from stacktraces):
>> [38128.430694] CPU: 2 PID: 32735 Comm: glmark2 Not tainted 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [44168.081821] CPU: 2 PID: 21555 Comm: cat Tainted: GÂÂÂÂÂÂÂ WÂÂÂÂÂÂÂÂ 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [75879.424076] CPU: 3 PID: 1478 Comm: flashcp Tainted: GÂÂÂÂÂÂÂ WÂÂÂÂÂÂÂÂ 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [117060.951333] CPU: 1 PID: 14150 Comm: cat Tainted: GÂÂÂÂÂÂÂ WÂÂÂÂÂÂÂÂ 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [201668.438219] CPU: 1 PID: 14370 Comm: glmark2 Tainted: GÂÂÂÂÂÂÂ WÂÂÂÂÂÂÂÂ 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [214332.212960] CPU: 0 PID: 25633 Comm: cp Tainted: GÂÂÂÂÂÂÂ WÂÂÂÂÂÂÂÂ 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [283484.619058] CPU: 0 PID: 5091 Comm: glmark2 Tainted: GÂÂÂÂÂÂÂ WÂÂÂÂÂÂÂÂ 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [330256.709124] CPU: 2 PID: 4565 Comm: sensors Tainted: GÂÂÂÂÂÂÂ WÂÂÂÂÂÂÂÂ 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [337685.549556] CPU: 2 PID: 30132 Comm: sensors Tainted: GÂÂÂÂÂÂÂ WÂÂÂÂÂÂÂÂ 4.19.32-devboardimx6q+g0a64e3717985 #1
>>
>> When I started debugging the issue, I noticed that in all cases we're dealing with highmem zero-order pages. In this case, page_head(page) == page, so page_address(page) should be equal to page_address(head).
>> However, it isn't the case, as page_address(head) in each case returns zero, causing the value of "v" to explode, and the check to fail.
>>
>> At first I thought, that it could be caused by modification of struct page itself, but it turned out to not be the case - sometimes struct page would be unmodified during duration of the check. With lowmem pages, this wouldn't also happen, because page_address is derived directly from struct page pointer itself.
>>
>> Finally, after gathering a few traces with added debug logs I decided to trigger a panic on the warning and capture a vmcore.
>> When analyzing the vmcore, I confirmed that the struct page_address_slot corresponding to the struct page had an empty list of mappings, explaining NULL returned from page_address(head) in page_copy_sane().
>> Yeah, I had to manually calculate hash of struct page pointer ;)
>>
>> In the meantime I noticed a patch by Eric Dumazet ("iov_iter: optimize page_copy_sane()") [1]. Applying this patch silenced the warnings, at least for test run going for over a week. This is quite expected, as with this change 'page' isn't dereferenced at all, nor is page_address() called. Here I suspect that applying this patch may only hide a real issue I'm facing, and might be missing the case of highmem pages as well.
>>
>> An example stacktrace I captured, with added debug logs and copy of struct page before and after the check in page_copy_sane:
>>
>> This last one shows no changes to struct page over the call to page_copy_sane(), where first call to page_address(page) returns a valid page address (page_address=2146996224) and second call returns 0 (head_address=0). Subsequent calls to page_address inside WARN() argument list also return 0. The same was true for the occurence I captured vmcore for.
>>
>> (For your reference, a patch containing my debug prints is at the end of the message)
>>
>> [337685.344204] ------------[ cut here ]------------
>> [337685.356870] WARNING: CPU: 0 PID: 30132 at /mnt/work/git/platforms-manifest/build_devboardimx6q/tmp/work-shared/devboardimx6q/kernel-source/lib/iov_iter.c:834 page_copy_sane+0x13c/0x178
>> [337685.420706] page_copy_sane: page=7721123e, offset=0, n=4096, v=2147000320, head=7721123e, page_address=2146996224, head_address=0, compound_order(head)=0, page_address(page)=0, page_address(head)=0 page_address(page_copy)=0, flags(page)=referenced|uptodate|lru|active|arch_1, flags(page_copy)=referenced|uptodate|lru|active|arch_1
>> [337685.494736] Modules linked in: xt_nat iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables usb_f_mass_storage usb_f_rndis u_ether ci_hdrc_imx ci_hdrc usbmisc_imx ulpi libcomposite configfs udc_core
>> [337685.549556] CPU: 2 PID: 30132 Comm: sensors Tainted: GÂÂÂÂÂÂÂ WÂÂÂÂÂÂÂÂ 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [337685.558314] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
>> [337685.563600] [<8010fe44>] (unwind_backtrace) from [<8010b840>] (show_stack+0x10/0x14)
>> [337685.570081] [<8010b840>] (show_stack) from [<807a5958>] (dump_stack+0x88/0x9c)
>> [337685.576033] [<807a5958>] (dump_stack) from [<8011f500>] (__warn+0xfc/0x114)
>> [337685.581721] [<8011f500>] (__warn) from [<8011f560>] (warn_slowpath_fmt+0x48/0x6c)
>> [337685.587936] [<8011f560>] (warn_slowpath_fmt) from [<803a0664>] (page_copy_sane+0x13c/0x178)
>> [337685.595004] [<803a0664>] (page_copy_sane) from [<803a2fe4>] (copy_page_to_iter+0x18/0x474)
>> [337685.602020] [<803a2fe4>] (copy_page_to_iter) from [<801bd9d4>] (generic_file_read_iter+0x2cc/0x980)
>> [337685.609794] [<801bd9d4>] (generic_file_read_iter) from [<8020949c>] (__vfs_read+0xf8/0x158)
>> [337685.616866] [<8020949c>] (__vfs_read) from [<80209588>] (vfs_read+0x8c/0x118)
>> [337685.622717] [<80209588>] (vfs_read) from [<80209aac>] (ksys_read+0x4c/0xac)
>> [337685.628396] [<80209aac>] (ksys_read) from [<80101000>] (ret_fast_syscall+0x0/0x54)
>> [337685.634668] Exception stack(0xa4e79fa8 to 0xa4e79ff0)
>> [337685.638427] 9fa0:ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 00000008 00000000 00000003 7ed03208 00000200 00000000
>> [337685.645318] 9fc0: 00000008 00000000 76fe8958 00000003 00000000 7ed03208 76fe5640 7ed031b4
>> [337685.652205] 9fe0: 76fe7cf0 7ed0316c 76fbeee0 76fd1b1c
>> [337685.717080] ---[ end trace 660d072e57b3d168 ]---
>> [337685.720698] page_copy 00000000: 6c 04 00 40 84 1f f4 ef dc aa c4 80 5c e6 9b ed l..@........\...
>> [337685.720713] page_copy 00000010: 00 00 00 00 00 00 00 00 ff ff ff ff 02 00 00 00Â ................
>> [337685.720721] *page 00000000: 6c 04 00 40 84 1f f4 ef dc aa c4 80 5c e6 9b ed l..@........\...
>> [337685.720728] *pageÂÂÂÂ 00000010: 00 00 00 00 00 00 00 00 ff ff ff ff 02 00 00 00Â ................
>>
>> The scenario with dropping caches via /proc/sys/vm/drop_caches seems to be rarely used (and tested), however I suspect that during dropping caches, some pages might be unmapped too early causing other hard-to notice problems.
>>
>> As my findings are quite worrisome, I'd be glad to hear if my analysis is valid, and if optimization made by Eric can be considered 'the fix', or maybe it is too optimistic.
>> Also as I'd like to investigate the issue further, I'd be glad for any hints on where to continue.
>>
>> [1] https://lkml.org/lkml/2019/2/26/686, merged as 6daef95b8c91 in master
>>
>> Patch containing debug logs:
>>
>> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
>> index 8be175df3075..ded7e073434c 100644
>> --- a/lib/iov_iter.c
>> +++ b/lib/iov_iter.c
>> @@ -815,14 +815,25 @@ bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
>> Â}
>> ÂEXPORT_SYMBOL(_copy_from_iter_full_nocache);
>> Â
>> -static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
>> +static noinline bool page_copy_sane(struct page *page, size_t offset, size_t n)
>> Â{
>> +ÂÂÂÂÂÂ struct page page_copy = *page;
>> ÂÂÂÂÂÂÂ struct page *head = compound_head(page);
>> -ÂÂÂÂÂÂ size_t v = n + offset + page_address(page) - page_address(head);
>> +ÂÂÂÂÂÂ size_t page_addr = (size_t) page_address(page);
>> +ÂÂÂÂÂÂ size_t head_addr = (size_t) page_address(head);
>> +ÂÂÂÂÂÂ size_t v = n + offset + page_addr - head_addr;
>> +ÂÂÂÂÂÂ unsigned int order = compound_order(head);
>> Â
>> -ÂÂÂÂÂÂ if (likely(n <= v && v <= (PAGE_SIZE << compound_order(head))))
>> +ÂÂÂÂÂÂ if (likely(n <= v && v <= (PAGE_SIZE << order)))
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return true;
>> -ÂÂÂÂÂÂ WARN_ON(1);
>> +ÂÂÂÂÂÂ WARN(1, "%s: page=%p, offset=%zu, n=%zu, v=%zu, head=%p, page_address=%zu,"
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ " head_address=%zu, compound_order(head)=%u, page_address(page)=%zu,"
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ " page_address(head)=%zu page_address(page_copy)=%zu, flags(page)=%pGp, flags(page_copy)=%pGp\n",
>> +ÂÂÂÂÂÂÂÂÂÂÂ __func__, page, offset, n, v, head, page_addr, head_addr, order,
>> +ÂÂÂÂÂÂÂÂÂÂ (size_t) page_address(page), (size_t) page_address(head),
>> +ÂÂÂÂÂÂÂÂÂÂ (size_t) page_address(&page_copy), &page->flags, &page_copy.flags);
>> +ÂÂÂÂÂÂ print_hex_dump_bytes("page_copy ", DUMP_PREFIX_OFFSET, &page_copy, sizeof(page_copy));
>> +ÂÂÂÂÂÂ print_hex_dump_bytes("*pageÂÂÂÂ ", DUMP_PREFIX_OFFSET, page, sizeof(*page));
>> ÂÂÂÂÂÂÂ return false;
>> Â}
>>
>> -- Pozdrawiam/With kind regards, Lech Perczak Sr. Software Engineer Camlin Technologies Poland Limited Sp. z o.o.
>>
As my question went in during Linux MM & FS summit, it might have gone unnoticed. So just a kind reminder :)
Maybe someone got a chance to look at it?

--
Pozdrawiam/With kind regards,
Lech Perczak

Sr. Software Engineer
Camlin Technologies Poland Limited Sp. z o.o.