Re: [RFC][PATCH 6/8] mm: handle_speculative_fault()

From: KAMEZAWA Hiroyuki
Date: Tue Jan 05 2010 - 00:34:16 EST


On Mon, 4 Jan 2010 21:10:29 -0800 (PST)
Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

>
>
> On Tue, 5 Jan 2010, KAMEZAWA Hiroyuki wrote:
> >
> > Then, my patch dropped speculative trial of page fault and did synchronous
> > job here. I'm still considering how to insert some barrier to delay calling
> > remove_vma() until all page fault goes. One idea was reference count but
> > it was said not-enough crazy.
>
> What lock would you use to protect the vma lookup (in order to then
> increase the refcount)? A sequence lock with RCU lookup of the vma?
>

Ah, I just used reference counter to show "how many threads are in
page fault to this vma now". Below is from my post.

==
+ rb_node = rcu_dereference(rb_node->rb_left);
+ } else
+ rb_node = rcu_dereference(rb_node->rb_right);
+ }
+ if (vma) {
+ if ((vma->vm_start <= addr) && (addr < vma->vm_end)) {
+ if (!atomic_inc_not_zero(&vma->refcnt))
+ vma = NULL;
+ } else
+ vma = NULL;
+ }
+ rcu_read_unlock();

...
+void vma_put(struct vm_area_struct *vma)
+{
+ if ((atomic_dec_return(&vma->refcnt) == 1) &&
+ waitqueue_active(&vma->wait_queue))
+ wake_up(&vma->wait_queue);
+ return;
+}
==

And wait for this reference count to be good number before calling
remove_vma()
==
+/* called when vma is unlinked and wait for all racy access.*/
+static void invalidate_vma_before_free(struct vm_area_struct *vma)
+{
+ atomic_dec(&vma->refcnt);
+ wait_event(vma->wait_queue, !atomic_read(&vma->refcnt));
+}
+
....
* us to remove next before dropping the locks.
*/
__vma_unlink(mm, next, vma);
+ invalidate_vma_before_free(next);
if (file)
__remove_shared_vm_struct(next, file, mapping);

etc....
==
Above codes are a bit heavy(and buggy). I have some fixes.

> Sounds doable. But it also sounds way more expensive than the current VM
> fault handling, which is pretty close to optimal for single-threaded
> cases.. That RCU lookup might be cheap, but just the refcount is generally
> going to be as expensive as a lock.
>
For single-threaded apps, my patch will have no benefits.
(but will not make anything worse.)
I'll add CONFIG and I wonder I can enable speculave_vma_lookup
only after mm_struct is shared.(but the patch may be messy...)

> Are there some particular mappings that people care about more than
> others? If we limit the speculative lookup purely to anonymous memory,
> that might simplify the problem space?
>

I wonder, for usual people who don't write highly optimized programs,
some small benefit of skipping mmap_sem is to reduce mmap_sem() ping-pong
after doing fork()->exec(). This can cause some jitter to the application.
So, I'm glad if I can help file-backed vmas.

> [ From past experiences, I suspect DB people would be upset and really
> want it for the general file mapping case.. But maybe the main usage
> scenario is something else this time? ]
>

I'd like to hear use cases of really heavy users, too. Christoph ?

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/