Re: USB mass storage and ARM cache coherency

From: James Bottomley
Date: Sat Mar 06 2010 - 22:37:50 EST

On Sun, 2010-03-07 at 08:03 +1100, Benjamin Herrenschmidt wrote:
> On Sat, 2010-03-06 at 16:17 +0530, James Bottomley wrote:
> > On a fault in of exec data, we first try to get the page out of the page
> > cache. If it's not present, we put the faulting process to sleep and
> > fetch it in from storage. When we do the read, on the PIO path, the
> > kernel alias for the page becomes dirty. Some time later, we place the
> > page into the user space (updating the pte entry that caused a fault).
> > At this point, we'll call both flush_icache_page() and
> > update_mmu_cache() ... this is where the I/D resolution should be done.
> > Since it's after any I/O has occurred, it doesn't matter whether the CPU
> > speculatively moved anything in or not. As long as you flush the kernel
> > alias and invalidate the user I and D aliases, we're good to go. Using
> > the page arch flags is really only to optimise this process (defer
> > kernel D alias flushing).
> Ok, so while flush_icache_page() looks like something we could use
> instead of set_pte_at() for the icache flushing, it doesn't answer all
> the questions. Off the top of my mind:

OK, so what I was actually trying to get across is the point that we
don't handle I cache problems in the I/O or page cache code ... we
handle them in the mm code, so the mm piece of the above was
deliberately a bit vague.

> - I see the calls to flush_icache_page() in mm/memory.c but I don't see
> them next to all set_pte_at() that insert a valid PTE. For example, we
> don't flush the icache for anonymous pages. While that might seem like a
> good idea, we have been under pressure to "fix" that on powerpc to make
> sure there is no stale icache content from another process leaking into
> userspace.

I'm not entirely sure what flush_icache_page() is supposed to do. On
parisc it flushes the *kernel* icache ... which has got to be wrong.
According to cachetlb.txt it's an obsolete interface.

> - It needs to be done -before- set_pte_at() but I think the code does it
> right, only your explanation above makes it unclear :-)

Sorry, like I said, I only sketched the mm piece. However, at least on
parisc, there's a technical problem with flushing before we have the
pte: On VIPT systems, we need a mapping before the flush will work. I
was experimenting with a mechanism whereby we set aside in the kernel an
aligned region of our congruence size and simply flushed in that region
with the correct mappings, but we haven't got around to implementing it
in the kernel yet.

> - It doesn't take the PTE pointer as an argument, so here goes our trick
> on powerpc of filtering out exec permission rather than flushing when a
> page is accessed by a read fault
> - We -still- have the problem of tracking whether the icache has been
> flushed or not yet for a given physical page on archs with PIPT (or non
> aliasing VIPT) like powerpc. Without that tracking, we flush a lot more
> than necessary since we'll end up flushing things like glibc text pages
> for every process they are mapped into which is totally wasteful. Thus
> the idea of using a new PG bit to separate D$ from I$ tracking still
> makes sense.

So, assuming full congruence of user space, can't you use the VMA as an
indicator? i.e. if we have no user space mappings, we have to flush the
icache ... if we have one or more, the icache has been flushed and
placing the same page congruently in a different address space benefits
from that prior flush, so consequently there's no need to flush again?

I also think we've established the relevant facts for the I/O thread
(that we only need to either flush the kernel D cache or mark it as to
be flushed later on PIO reads). We're now into deep technicalities of
how the mm system operates at the architecture level, so perhaps we
should move this to linux-arch?


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at