Re: 2.6.22 -mm merge plans -- vm bugfixes

From: Nick Piggin
Date: Wed May 09 2007 - 10:45:45 EST


Hugh Dickins wrote:
On Wed, 9 May 2007, Nick Piggin wrote:

Hugh Dickins wrote:

On Wed, 2 May 2007, Nick Piggin wrote:

But I'm pretty sure (to use your words!) regular truncate was not racy
before: I believe Andrea's sequence count was handling that case fine,
without a second unmap_mapping_range.

OK, I think you're right. I _think_ it should also be OK with the
lock_page version as well: we should not be able to have any pages
after the first unmap_mapping_range call, because of the i_size
write. So if we have no pages, there is nothing to 'cow' from.

I'd be delighted if you can remove those later unmap_mapping_ranges.
As I recall, the important thing for the copy pages is to be holding
the page lock (or whatever other serialization) on the copied page
still while the copy page is inserted into pagetable: that looks
to be so in your __do_fault.

Hmm, on second thoughts, I think I was right the first time, and do
need the unmap after the pages are truncated. With the lock_page code,
after the first unmap, we can get new ptes mapping pages, and
subsequently they can be COWed and then the original pte zapped before
the truncate loop checks it.


The filesystem (or page cache) allows pages beyond i_size to come
in there? That wasn't a problem before, was it? But now it is?

The filesystem still doesn't, but if i_size is updated after the page
is returned, we can have a problem that was previously taken care of
with the truncate_count but now isn't.

However, I wonder if we can't test mapping_mapped before the spinlock,
which would make most truncates cheaper?


Slightly cheaper, yes, though I doubt it'd be much in comparison with
actually doing any work in unmap_mapping_range or truncate_inode_pages.

But if we're supposing the common case for truncate is unmapped mappings,
then the main cost there will be the locking, which I'm trying to avoid.
Hopefully with this patch, most truncate workloads would get faster, even
though truncate mapped files is going to be unavoidably slower.


Suspect you'd need a barrier of some kind between the i_size_write and
the mapping_mapped test?

The unmap_mapping_range that runs after the truncate_inode_pages should
run in the correct order, I believe.

But that's a change we could have made at
any time if we'd bothered, it's not really the issue here.

I don't see how you could, because you need to increment truncate_count.

But I believe this is fixing the issue, even if it does so in a peripheral
manner, because it avoids the added cost for unmapped files.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/