Re: [patch 9/9] mm: fix pagecache write deadlocks

From: Nick Piggin
Date: Mon Feb 05 2007 - 21:26:17 EST


On Sun, Feb 04, 2007 at 10:36:20AM -0800, Andrew Morton wrote:
> On Sun, 4 Feb 2007 16:10:51 +0100 Nick Piggin <npiggin@xxxxxxx> wrote:
>
> > They're not likely to hit the deadlocks, either. Probability gets more
> > likely after my patch to lock the page in the fault path. But practially,
> > we could live without that too, because the data corruption it fixes is
> > very rare as well. Which is exactly what we've been doing quite happily
> > for most of 2.6, including all distro kernels (I think).
>
> Thing is, an application which is relying on the contents of that page is
> already unreliable (or really peculiar), because it can get indeterminate
> results anyway.

Not necessarily -- they could read from one part of a page and write to
another. I see this as the biggest data corruption problem.

But even in the case where they can get indeterminate results, they can
still determine what the results *won't* be. Eg. they might use a single
byte for a flag or something.

> > ...
> >
> > On a P4 Xeon, SMP kernel, on a tmpfs filesystem, a 1GB dd if=/dev/zero write
> > had the following performance (higher is worse):
> >
> > Orig kernel New kernel
> > new file (no pagecache)
> > 4K blocks 1.280s 1.287s (+0.5%)
> > 64K blocks 1.090s 1.105s (+1.4%)
> > notrunc (uptodate pagecache)
> > 4K blocks 0.976s 1.001s (+0.5%)
> > 64K blocks 0.780s 0.792s (+1.5%)
> >
> > [numbers are better than +/- 0.005]
> >
> > So we lose somewhere between half and one and a half of one percent
> > performance in a pagecache write intensive workload.
>
> That's not too bad - caches are fast. Did you look at optimising the
> handling of that temp page, ensure that we always use the same page? I
> guess the page allocator per-cpu-pages thing is being good here.

Yeah it should be doing a reasonable job.

> I'm not sure how, though. Park a copy in the task_struct, just as an
> experiment. But that'd de-optimise multiple-tasks-writing-on-the-same-cpu.
> Maybe a per-cpu thing? Largely duplicates the page allocator's per-cpu-pages.

Putting a copy in the task_struct won't do much I figure, except saving
a copule of interrupt enable/disable, and being more wasteful of memory
and cache-hotness.

Per-cpu doesn't work because we can't hold preempt off over the usercopy
(well, we *could* do it in a loop together with fault_in_pages, but that
just adds to the icache bloat).

>
> Of course, we're also increasing caceh footprint, which this test won't
> show.

We are indeed. At least we release the hot page back to the allocator
very quickly that it can be reused.

The upshot is that your writev performance will be improved :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/