Re: Recurring warning in page_copy_sane (inside copy_page_to_iter) when running stress tests involving drop_caches

From: Matthew Wilcox
Date: Wed May 15 2019 - 10:45:48 EST


> > W dniu 25.04.2019 o 11:25, Lech Perczak pisze:
> >> Some time ago, after upgrading the Kernel on our i.MX6Q-based boards to mainline 4.18, and now to LTS 4.19 line, during stress tests we started noticing strange warnings coming from 'read' syscall, when page_copy_sane() check failed. Typical reproducibility is up to ~4 events per 24h. Warnings origin from different processes, mostly involved with the stress tests, but not necessarily with block devices we're stressing. If the warning appeared in process relating to block device stress test, it would be accompanied by corrupted data, as the read operation gets aborted.
> >>
> >> When I started debugging the issue, I noticed that in all cases we're dealing with highmem zero-order pages. In this case, page_head(page) == page, so page_address(page) should be equal to page_address(head).
> >> However, it isn't the case, as page_address(head) in each case returns zero, causing the value of "v" to explode, and the check to fail.

You're seeing a race between page_address(page) being called twice.
Between those two calls, something has caused the page to be removed from
the page_address_map() list. Eric's patch avoids calling page_address(),
so apply it and be happy.

Greg, can you consider 6daef95b8c914866a46247232a048447fff97279 for
backporting to stable? Nobody realised it was a bugfix at the time it
went in. I suspect there aren't too many of us running HIGHMEM kernels
any more.