Re: open(2) says O_DIRECT works on 512 byte boundries?

From: KAMEZAWA Hiroyuki
Date: Mon Feb 02 2009 - 21:57:16 EST


On Tue, 3 Feb 2009 03:31:47 +0100
Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote:

> On Tue, Feb 03, 2009 at 10:29:20AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Mon, 2 Feb 2009 23:08:56 +0100
> > Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote:
> >
> > > Hi Greg!
> > >
> > > > Thanks for the pointers, I'll go read the thread and follow up there.
> > >
> > > If you also run into this final fix is attached below. Porting to
> > > mainline is a bit hard because of gup-fast... Perhaps we can use mmu
> > > notifiers to fix gup-fast... need to think more about it then I'll
> > > post something.
> > >
> > > Please help testing the below on pre-gup-fast kernels, thanks!
> > >
> > I commented in FJ-Redhat Path but not forwared from unknown reason ;)
> > I comment again.
> >
> > 1. Why TestSetLockPage() is necessary ?
> > It seems not necesary.
>
> To avoid the VM to remove or add the page from/to swapcache and change
> page_count/mapcount from under us. This most certainly wasn't the
> reason of the slowdown (the slowdown were the false positives
> generated by pagevec pinning) and removing it was more intrusive than
> I wanted.

My point is.
- If TestSetLockPage() failes, force_cow=1.
- If count/mapcount check fails, force_cow=1.

So, lock_page() here seems meaningless. If you consider lock_page() is important,
just use lock_page() seems better.

>
> > 2. This patch doesn't cover HugeTLB.
>
> There's no need to change hugetlb with my approach. I'm not touching
> the cow path, I'm addressing the real source of the problem (i.e. when
> fork pretends to mark the child pte readonly and pointing to the
> shared parent page, same as ksm: while the pte wrprotect + tlb flush
> stops the _cpu_ it can't stop any get_user_pages(write=1) user, hence
> we need to pre-cow the child page in fork instead of marking the child
> pte readonly to avoid the parent to lose writes if post-fork the
> parent cows and the child doesn't cow).
>
No need to make a patch for copy_hugetlb_page_range() ?
IMHO, HugeTLB can be write-protected at fork().

> > 3. Why "follow_page() successfully finds a page" case only ?
> > not necessary to insert SetPageGUP() in following path ?
> >
> > - handle_mm_fault()
> > => do_anonymos/swap/wp_page()
> > or some.
>
> No need to change that either, all we need to know are the pages whose
> count vs mapcount has a discrepancy that could have been caused by
> get_user_pages. So only follow_page has to set it. More precisely
> FOLL_GET|FOLL_WRITE is the only path we care about there.
>

Assume 3 threads in a process.
==
Thread1 (DIO-Read) Thread2 Thread3
get_user_page()
=> handle_mm_fault().
=> map a page with no-write-protect.
fork()
(write-protect here)
Copy-On-Write
endio.

pre-cow-at-fork will never happen becasue PageGUP is not set.
After the end of READ, this process will see a broken page.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/