Re: still nfs problems [Was: Linux 2.6.37-rc8]

From: Trond Myklebust
Date: Fri Jan 07 2011 - 13:53:37 EST


On Thu, 2011-01-06 at 09:55 -0800, Linus Torvalds wrote:
> On Thu, Jan 6, 2011 at 9:47 AM, Trond Myklebust
> <Trond.Myklebust@xxxxxxxxxx> wrote:
> >
> > Why is this line needed? We're not writing through the virtual mapping.
>
> I haven't looked at the sequence of accesses, but you need to be
> _very_ aware that "write-through" is absolutely NOT sufficient for
> cache coherency.
>
> In cache coherency, you have three options:
>
> - true coherency (eg physically indexed/tagged caches)
>
> - exclusion (eg virtual caches, but with an exclusion guarantee that
> guarantees that aliases cannot happen: either by using physical
> tagging or by not allowing cases that could cause virtual aliases)
>
> - write-through AND non-cached reads (ie "no caching at all").
>
> You seem to be forgetting the "no cached reads" part. It's not
> sufficient to flush after a write - you need to make sure that you
> also don't have a cached copy of the alias for the read.
>
> So "We're not writing through the virtual mapping" is NOT a sufficient
> excuse. If you're reading through the virtual mapping, you need to
> make sure that the virtual mapping is flushed _after_ any writes
> through any other mapping and _before_ any reads through the virtual
> one.

I'm aware of that. That part should be taken care of by the call to
invalidate_kernel_vmap_range() which was in both James and my patch.

There is already code in the SUNRPC layer that calls flush_dcache_page()
after writing (although as Russell pointed out earlier, that is
apparently a no-op for non-page cache pages such as these).

> This is why you really really really generally don't want to have
> aliasing. Purely virtual caches are pure crap. Really.

Well, it looks as if NOMMU is giving us problems due to the lack of a
vm_map_ram() (see https://bugzilla.kernel.org/show_bug.cgi?id=26262).

I'd still like to keep the existing code for those architectures that
don't have problems, since that allows us to send 32k READDIR requests
instead of being limited to 4k. For large directories, that is a clear
win.
For the NOMMU case we will just go back to using a single page for
storage (and 4k READDIR requests only). Should I just do the same for
architectures like ARM and PARISC?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/