Re: smp race fix between invalidate_inode_pages* and do_no_page

From: Andrew Morton
Date: Tue Dec 13 2005 - 16:02:00 EST


Andrea Arcangeli <andrea@xxxxxxx> wrote:
>
> ...
>
> Clearly taking the page lock in do_no_page would fix it too, but it
> would destroy the scalability of page faults.

It's always bugged me that filemap_nopage() doesn't lock the page. There
might be additional uglies which could be tidied up if we were to do so.

The scalability loss would happen if there are multiple processes/threads
faulting in the same page I guess. I wonder how important that would be.

I suppose that even if we did lock the page in filemap_nopage(), the
coverage isn't sufficient here - it needs to extend into do_no_page()?


> The seqschedlock instead
> is 100% smp scalable in the fast path (exactly like RCU and seqlocks),
> and it's lightweight in the write path too and it doesn't introducing
> latencies in freeing memory.

Why this rather than down_read/down_write? We might even be able to hoist
ext3_inode's i_truncate_sem into the address_space, for zero space cost on
most Linux inodes (dunno).

Is there some way in which we can use mapping->truncate_count to tell
do_no_page() that it raced with invalidate()? By checking it after the pte
has been established?

> My main concern is the yield() instead of a waitqueue but perhaps that
> can be improved somehow (I didn't think much about it yet). Not sure if
> a waitqueue would be better off inside or outside the seqschedlock, but
> I believe these are secondary matters, in practice it should never
> trigger.

yield() is pretty sucky.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/