Re: [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate

From: Lars Persson
Date: Tue Feb 26 2019 - 04:46:32 EST


On Tue, Feb 26, 2019 at 10:23 AM Anshuman Khandual
<anshuman.khandual@xxxxxxx> wrote:
> On 02/19/2019 06:02 PM, Lars Persson wrote:
> > Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL
> > and SIGSEGV that could not be traced back to a userspace code
> > bug. They had all the magic signs of an I/D cache coherency issue.
> >
> > Now recently we noticed that the /proc/sys/vm/compact_memory interface
> > was quite efficient at provoking this class of userspace crashes.
> >
> > Studying the code in mm/migrate.c there is a distinction made between
> > migrating a page that is mapped at the instant of migration and one
> > that is not mapped. Our problem turned out to be the non-mapped pages.
> >
> > For the non-mapped page the code performs a copy of the page content
> > and all relevant meta-data of the page without doing the required
> > D-cache maintenance. This leaves dirty data in the D-cache of the CPU
> > and on the 1004K cores this data is not visible to the I-cache. A
> > subsequent page-fault that triggers a mapping of the page will happily
> > serve the process with potentially stale code.
>
> Just curious. Is not the code path which tries to map this page should
> do the invalidation just before setting it up in the page table via
> set_pte_at() or other similar variants ? How it maps without doing the
> necessary flush.

In fact this is what happens when the flush_dcache_page API was used
correctly, but it is an arch implementation detail. All kernel code
that writes to a page cage page must also call flush_dcache_page
before the page becomes eligible for mapping. The arch code has the
option to postpone the actual flush until set_pte_at maps the page.