Re: open(2) says O_DIRECT works on 512 byte boundries?

From: Andrea Arcangeli
Date: Mon Feb 02 2009 - 21:32:12 EST


On Tue, Feb 03, 2009 at 10:29:20AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 2 Feb 2009 23:08:56 +0100
> Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote:
>
> > Hi Greg!
> >
> > > Thanks for the pointers, I'll go read the thread and follow up there.
> >
> > If you also run into this final fix is attached below. Porting to
> > mainline is a bit hard because of gup-fast... Perhaps we can use mmu
> > notifiers to fix gup-fast... need to think more about it then I'll
> > post something.
> >
> > Please help testing the below on pre-gup-fast kernels, thanks!
> >
> I commented in FJ-Redhat Path but not forwared from unknown reason ;)
> I comment again.
>
> 1. Why TestSetLockPage() is necessary ?
> It seems not necesary.

To avoid the VM to remove or add the page from/to swapcache and change
page_count/mapcount from under us. This most certainly wasn't the
reason of the slowdown (the slowdown were the false positives
generated by pagevec pinning) and removing it was more intrusive than
I wanted.

> 2. This patch doesn't cover HugeTLB.

There's no need to change hugetlb with my approach. I'm not touching
the cow path, I'm addressing the real source of the problem (i.e. when
fork pretends to mark the child pte readonly and pointing to the
shared parent page, same as ksm: while the pte wrprotect + tlb flush
stops the _cpu_ it can't stop any get_user_pages(write=1) user, hence
we need to pre-cow the child page in fork instead of marking the child
pte readonly to avoid the parent to lose writes if post-fork the
parent cows and the child doesn't cow).

> 3. Why "follow_page() successfully finds a page" case only ?
> not necessary to insert SetPageGUP() in following path ?
>
> - handle_mm_fault()
> => do_anonymos/swap/wp_page()
> or some.

No need to change that either, all we need to know are the pages whose
count vs mapcount has a discrepancy that could have been caused by
get_user_pages. So only follow_page has to set it. More precisely
FOLL_GET|FOLL_WRITE is the only path we care about there.

> BTW, when you write a patch against upstream, please CC me or linux-mm.
> I'll have to add a hook for memory-cgroup.

Sure.

BTW, despite I didn't reproduce the problem here while leaving the
./dma_thread -a 512 -w 40 workload run half a day, others reported me
trouble but it was on a different kernel codebase, but at this time
I'm unsure if any remaining trouble is caused by some imperfection in
this patch or something else. Test results would be interesting
basically. Patch is against rhel-5.2 but should be trivial to apply to
anything pre-get_user_pages_fast.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/