Re: Recurring warning in page_copy_sane (inside copy_page_to_iter) when running stress tests involving drop_caches
From: Eric Dumazet
Date: Wed May 15 2019 - 11:04:24 EST
On Wed, May 15, 2019 at 7:43 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> > > W dniu 25.04.2019 o 11:25, Lech Perczak pisze:
> > >> Some time ago, after upgrading the Kernel on our i.MX6Q-based boards to mainline 4.18, and now to LTS 4.19 line, during stress tests we started noticing strange warnings coming from 'read' syscall, when page_copy_sane() check failed. Typical reproducibility is up to ~4 events per 24h. Warnings origin from different processes, mostly involved with the stress tests, but not necessarily with block devices we're stressing. If the warning appeared in process relating to block device stress test, it would be accompanied by corrupted data, as the read operation gets aborted.
> > >>
> > >> When I started debugging the issue, I noticed that in all cases we're dealing with highmem zero-order pages. In this case, page_head(page) == page, so page_address(page) should be equal to page_address(head).
> > >> However, it isn't the case, as page_address(head) in each case returns zero, causing the value of "v" to explode, and the check to fail.
>
> You're seeing a race between page_address(page) being called twice.
> Between those two calls, something has caused the page to be removed from
> the page_address_map() list. Eric's patch avoids calling page_address(),
> so apply it and be happy.
Hmm... wont the kmap_atomic() done later, after page_copy_sane() would
suffer from the race ?
It seems there is a real bug somewhere to fix.
>
> Greg, can you consider 6daef95b8c914866a46247232a048447fff97279 for
> backporting to stable? Nobody realised it was a bugfix at the time it
> went in. I suspect there aren't too many of us running HIGHMEM kernels
> any more.
>