Re: still nfs problems [Was: Linux 2.6.37-rc8]

From: Russell King - ARM Linux
Date: Wed Jan 05 2011 - 19:00:35 EST


On Wed, Jan 05, 2011 at 03:28:53PM -0800, Linus Torvalds wrote:
> On Wed, Jan 5, 2011 at 3:06 PM, Trond Myklebust
> <Trond.Myklebust@xxxxxxxxxx> wrote:
> >
> > Yes. The fix I sent out was a call to invalidate_kernel_vmap_range(),
> > which takes care of invalidating the cache prior to a virtual address
> > read.
> >
> > My question was specifically about the write through the regular kernel
> > mapping: according to Russell and my reading of the cachetlb.txt
> > documentation, flush_dcache_page() is only guaranteed to have an effect
> > on page cache pages.
>
> I don't think that should ever matter. It's not like the hardware can
> know whether it's a dcache page or not.
>
> And if the sw implementation cares, it's doing something really odd.

>From the hardware perspective you're correct that it doesn't. However,
from the efficient implementation perspective it does matter.

Take for example the read-ahead done on block devices. We don't want to
flush all those pages that were read in when we don't know that they're
ever going to end up in a user mapping. So what's commonly done (as
suggested by DaveM) is that flush_dcache_page() detects that it's a
dcache page, ensures that there's no user mappings, and sets a 'dirty'
flag. This flag is guaranteed to be clear when new, clean, unread
pages enter the page cache.

When the page eventually ends up in a user mapping, that dirty flag is
checked and the necessary cache flushing done at that point.

Note that when there are user mappings, flush_dcache_page() has to flush
those mappings too, otherwise mmap() <-> read()/write() coherency breaks.
I believe this was what flush_dcache_page() was created to resolve.

flush_kernel_dcache_page() was to solve the problem of PIO drivers
writing to dcache pages, so that data written into the kernel mapping
would be visible to subsequent user mappings.

We chose a different overall approach - which had already been adopted by
PPC - where we invert the meaning of this 'dirty' bit to mean that it's
clean. So every new page cache page starts out life as being marked
dirty and so nothing needs to be done at flush_kernel_dcache_page().
We continue to use davem's optimization but with the changed meaning of
the bit, but as we now support SMP we do the flushing at set_pte_at()
time.

This also means that we don't have to rely on the (endlessly) buggy PIO
drivers remembering to add flush_kernel_dcache_page() calls - something
which has been a source of constant never-ending pain for us.

The final piece of the jigsaw is flush_anon_page() which deals with
kernel<->user coherency for anonymous pages by flushing both the user
and kernel sides of the mapping. This was to solve direct-io coherency
problems.

As the users of flush_anon_page() always do:

flush_anon_page(vma, page, addr);
flush_dcache_page(page);

and documentation doesn't appear to imply that this will always be the
case, we restrict flush_dcache_page() to only work on page cache pages,
otherwise we end up flushing the kernel-side mapping multiple time in
succession.

Maybe we should make flush_anon_page() only flush the user mapping,
stipulate that it shall always be followed by flush_dcache_page(),
which shall flush the kernel side mapping even for anonymous pages?
That sounds to me like a recipe for missing flush_dcache_page() calls
causing bugs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/