Re: USB mass storage and ARM cache coherency

From: Catalin Marinas
Date: Tue Mar 02 2010 - 12:05:38 EST


On Tue, 2010-03-02 at 21:11 +0900, FUJITA Tomonori wrote:
> On Sun, 28 Feb 2010 10:31:03 +0530
> James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > But the point of all of this is that I cache invalidation doesn't appear
> > anywhere in the I/O path ... so if we're getting I/D incoherency,
> > there's some problem in the mm code (or there's a missing arch
> > assumption ... like I cache gets moved in more aggressively than we
> > expect). Parisc is very sensitive to I/D incoherency, so we'd notice if
> > there were a serious generic problem here.
>
> I'm not sure that there are some problems in the mm or common code. Is
> this ARM's implementation issue? (Of course, the usb stack and the
> driver's misuse of the DMA API needs to be fixed too).

Just to summarise - on ARM (PIPT / non-aliasing VIPT) there is I-cache
invalidation for user pages in update_mmu_cache() (it could actually be
in set_pte_at on SMP to avoid a race but that's for another thread). The
D-cache is flushed by this function only if the PG_arch_1 bit is set.
This bit is set in the ARM case by flush_dcache_page(), following the
advice in Documentation/cachetlb.txt.

With some drivers (those doing PIO) or subsystems (SCSI mass storage
over USB HCD), there is no call to flush_dcache_page() for page cache
pages, hence the ARM implementation of update_mmu_cache() doesn't flush
the D-cache (and only invalidating the I-cache doesn't help).

The viable solutions so far:

1. Implement a PIO mapping API similar to the DMA API which takes
care of the D-cache flushing. This means that PIO drivers would
need to be modified to use an API like pio_kmap()/pio_kunmap()
before writing to a page cache page.
2. Invert the meaning of PG_arch_1 to denote a clean page. This
means that by default newly allocated page cache pages are
considered dirty and even if there isn't a call to
flush_dcache_page(), update_mmu_cache() would flush the D-cache.
This is the PowerPC approach.

Option 2 above looks pretty appealing to me since it can be done in the
ARM code exclusively. I've done some tests and it indeed solves the
cache coherency with a rootfs on a USB stick. As Russell suggested, it
can be optimised to mark a page as clean when the DMA API is involved to
avoid duplicate flushing.

It was also suggested to add a PG_arch_2 flag which would keep track of
the I-cache status as well.

I can post a proposal to modify the cachetlb.txt document to reflect the
issues we currently have on ARM.

--
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/