Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

From: Linus Torvalds (torvalds@transmeta.com)
Date: Thu Oct 19 2000 - 19:45:30 EST


On Thu, 19 Oct 2000, Linus Torvalds wrote:
>
> I'm saying that we're much much better off guaranteeing local consistency
> over knowingly breaking local consistency over a uncertain global
> consistency issue. Especially as NFS has never guaranteed global
> consistency in the first place, and was not designed to do that.

Btw, if you really worry about NFS v3 and want to add consistency
guarantees to NFS v3, you can probably do that. It's just that you cannot
do it with a simple cache invalidation - you'd need to be much much more
careful (the same way a simple CPU cache invalidate does not guarantee SMP
cache consistency - it only guarantees data corruption due to missed
writes).

I don't know what the best way to do a true cache consistency protocol on
a page leve would be, especially as you will have a _really_ hard time
getting hold of an exclusive pointer to a page with multiple mmap's, but
you could probably get something that guarantees cache consistency as long
as people do _not_ expect mmap to always be 100% consistent.

(The reason you cannot get mmap consistency is just that mmap doesn't have
any kernel synchronization point unless you're willing to shoot down every
single mapping - which is expensive as hell, but doable).

You'd have to do something like

        LockPage(page); /* Nobody gets to write to this page (except through mmaps, ugh) */
        gather_all_mmap_users(page); /* THIS is the nasty one */
        nfs_wb_page(page); /* force write-back on this page */
        ClearPageUptodate(page); /* mark it not up-to-date to force a read-in next time */
        UnlockPage(page); /* Ok, now the client can go wild */

where everything but the "gather_all_mmap_users()" part is fairly
straightforward. The "gather" phase is nasty - it would need to figure
out every place the page is mapped, make sure those are synchronized (ie
something like marking the page table entry write-protected and causing a
TLB invalidate SMP cross-call - at which point the resulting page fault
and the page lock will catch anybody who tries to write to the page)..

In no case could you do something like what the current
invalidate_inode_pages() does, which is to just try to drop the page from
the cache - that really only works if we're the only user of that page,
which the "page_count != 1" test now enforces.

                Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Oct 23 2000 - 21:00:17 EST