Re: Linux 2.6.9-ac2
From: Andrea Arcangeli
Date: Thu Oct 28 2004 - 13:08:27 EST
On Thu, Oct 28, 2004 at 12:59:31PM -0200, Marcelo Tosatti wrote:
> On Thu, Oct 21, 2004 at 09:12:08PM +0100, Alan Cox wrote:
> > On Iau, 2004-10-21 at 20:34, David S. Miller wrote:
> > > 2.4.x will need this one as well, at least the AF_PACKET
> > > case. Would you mind if I pushed that to Marcelo?
> >
> > Not at all. Andrea has proposed fixing it a little differently.
> > For 2.6 making remap_page_range DTRT itself is ok but for 2.4 the
> > vma isn't passed.
>
> get_user_pages() is screwed, I'm just not sure
> about failing get_user_pages() if PageReserved page
> is encountered.
>
> I'm more worried about make_pages_present(), which is
> called by find_extend_vma/do_mmap_pgoff. Is it valid
> to have PageReserved pages on the zones handled
> by these functions anyway?
make_pages_present passes a null pages array, so it won't run the below
code.
> This is equivalent of Andrea's fix for mainline.
yes.
> Andrea, this in SuSE's tree for a while correct?
Yes, in another form but the below is the one against mainline.
In SUSE tree I fixed a COW memory corruption issue (something I should
submit for mainline 2.4 integration ASAP), so the patch looks different
there. pinning the page only at the struct page level and delaying
writepage until the page is mapped, works in all cases, except when
there are threads and a COW is generated while one thread is executing a
rawio/O_DIRECT read and the anon page receiving the I/O gets unmapped by
some memory pressure. The only way to fix it is to prevent the VM unmap
pages that are under rawio, so I had to add a PG_pinned that
unfortunately serializes the rawio page against page, in 2.6 I could fix
it trivially without any serialization by comparing page->count with
page->mapcount (with 2.6.7+ VM, and that's already merged in mainline
since 2.6.7/8 or so). I did fix it in time for 2.6.5 SLES9 too.
the biggest pain to make the PG_pinned work has been to implement a
wait_on_page_pte_pinned asynchronous, otherwise keventd was executing
wait_on_page_pte_pinned it was getting stuck and the I/O wasn't
submitted since keventd itself had other events pending generating a
deadlock. This is all fixed by the async version of
wait_on_page_pte_pinned, and it's all in SLES8 latest, so that even aio
cannot generate memory corruption anymore (most people don't notice this
since they don't use threads and they don't do I/O with anonymous memory
as destination but to close all pratical corruption holes this was
technically required). But 2.4 mainline has no async-io, so adding
PG_pinned there is fairly easy (modulo reject big pain).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/