Re: [PATCH] Revert "MIPS: Remove race window in page fault handling"

From: Leonid Yegoshin
Date: Fri Dec 05 2014 - 16:41:36 EST


On 12/05/2014 01:32 AM, Lars Persson wrote:

Our setup includes both a non-DMA block device and a compressing
file-system (UBIFS). A flush_dcache_page() is issued by UBIFS so your
patch fixes another problem that we do not hit.

The stack trace is not available now. Do we need it for any further
analysis ? I think the mechanism of the race window is understood and it
depends on the __flush_dcache_page() deciding that the flush should be

Unfortunately, the research of original case is still needed.
I looked into all cases of update_mmu_cache() besides HUGE page support and NUMA, and I see:

1. insert_pfn()
It is used to put a special page (read - VDSO) into memory map. No cache flush is needed here because page is done and flushed during system boot.

2. do_wp_page(), first occurrence
It has flush_cache_page() before it sets PTE in ptep_set_access_flags(). This flush is unconditional and affects all caches.

3. do_wp_page(), second case
It is done after preparing a clear new page or after COW. COW has an appropriate cache flush of destination in copy_user_highpage(). The immediate use of cleared new page as instruction (you had SIGILL, right?)... hm-m, something wrong in application in this case.

4. do_swap_page()
Well, it may be a case of flush_icache_page() is not used (see below) and page is taken from non-DMA swap. But I also recommend to look into

there is a bug in swap entry number presentation and it may affect your system.

5. do_anonymous_page()
The similar to case (3) - cleared new page, using of it as instruction page may point to some app problem.

6. do_set_pte()
It also has flush_icache_page() which may have impact if not implemented, see below.

7. handle_pte_fault()
Page is not touched and cache flush is NO-OP.

8. remove_migration_pte()
Well, it is a place for suspicion. But it should not run in parallel with any running thread - dirtying page while other thread is running is a way to disaster.

So, you see - if I understand it correctly, there is no place for failure... besides application misbehaviour or potential kernel bug in migration. Of course, I may miss something and that is a reason why stack trace is desirable.

I think the mechanism of the race window is understood and it
depends on the __flush_dcache_page() deciding that the flush should be

As I remember, you said you use HIGHMEM patch, right? It changes a little __flush_dcache_page() and flush of any mapped page is not postponed anymore. So, it has an immediate effect for application pages.

- Leonid.

- Lars

On Fri, 2014-12-05 at 03:16 +0100, Leonid Yegoshin wrote:
(repeat mesg, first one went to wrong place)


Do you have a stack trace or so then you found the second VPE between
set_pte_at and update_mmu_cache?
It would be interesting how it happens - generally, to get a consistent
SIGILL in applications due to misbehaviour of memory subsystem, the bug
in FS is not enough.

Hold on - do you use non-DMA file system?
If so, I advice you to try this simple patch:

Author: Leonid Yegoshin <yegoshin@xxxxxxxx>
Date: Tue Apr 2 14:20:37 2013 -0700

MIPS: (opt) Fix of reading I-pages from non-DMA FS devices for ID
cache separation

This optional fix provides a D-cache flush for instruction code
pages on
page faults. In case of non-DMA block device a driver doesn't know
that it
reads I-page and doesn't flush D-cache generally on systems without
cache aliasing. And that takes toll during page fault of
instruction pages.

It is not a perfect fix, it should be considered as a temporary fix.
The permanent fix would track page origin in page cache and flushes
during reception of page from driver only but not at each page fault.
It is not done yet.

Change-Id: I43f5943d6ce0509729179615f6b81e77803a34ac
Author: Leonid Yegoshin <yegoshin@xxxxxxxx>
Signed-off-by: Leonid Yegoshin <yegoshin@xxxxxxxx>(imported from
commit 6ebd22eb7a3d9873582ebe990a77094f971652ee)(imported from commit


@@ -61,6 +61,9 @@ static inline void flush_anon_page(struct
vm_area_struct *vma,
static inline void flush_icache_page(struct vm_area_struct *vma,
struct page *page)
+ if (cpu_has_dc_aliases ||
+ ((vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc))
+ __flush_dcache_page(page);

extern void (*flush_icache_range)(unsigned long start, unsigned
long end);

It fixed crash problems with non-DMA FS in a couple of our customers.
Without it the non-DMA root FS crashes are catastrophic in aliasing
systems but it is still a problem for I-cache too but much rare.

Unfortunately, it is also a performance hit, however is less than run a
page cache flush at each PTE setup. On 12/03/2014 06:03 AM, Lars Persson
It is the flush_dcache_page() that was called from the file-system
reading the page contents into memory.

- Lars

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at