Re: [-mm][PATCH 4/4] Add memrlimit controller accounting andcontrol (v4)

From: Balbir Singh
Date: Thu May 15 2008 - 03:04:30 EST


* Paul Menage <menage@xxxxxxxxxx> [2008-05-14 23:55:07]:

> On Wed, May 14, 2008 at 11:17 PM, Balbir Singh
> <balbir@xxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > Assuming that we're holding a write lock on mm->mmap_sem here, and we
> > > additionally hold mmap_sem for the whole of mm_update_next_owner(),
> > > then maybe we don't need any extra synchronization here? Something
> > > like simply:
> > >
> > > int memrlimit_cgroup_charge_as(struct mm_struct *mm, unsigned long nr_pages)
> > > {
> > > struct memrlimit_cgroup *memrcg = memrlimit_cgroup_from_task(mm->owner);
> > > return res_counter_charge(&memrcg->as_res, (nr_pages << PAGE_SHIFT));
> > > }
> >
> > The charge_as routine is not always called with mmap_sem held, since
> > the undo path gets more complicated under the lock. We already have
> > our own locking mechanism for the counters.
>
> I'm not worried about the counters themselves being inconsistent - I'm
> worried about the case where charge_as() is called in the middle of
> the attach operation, and we account the charge X to the new cgroup's
> res_counter and update mm->total_vm, and then when we do the move, we
> charge the whole of mm->total_mm to the new cgroup even though the
> last charge was already accounted to the new res_counter, not the old
> one.
>
> That's what I'm hoping to address with the idea of splitting the
> attach into one update per subsystem, and letting the subsystems
> control their own synchronization.
>
> > We're not really accessing
> > any member of the mm here except the owner. Do we need to be called
> > with mmap_sem held?
> >
>
> Not necessarily mmap_sem, but there needs to be something to ensure
> that the update to mm->total_vm and the charge/uncharge against the
> res_counter are an atomic pair with respect to the code that shifts an
> mm between two cgroups, either due to mm->owner change or due to an
> attach_task(). Since mmap_sem is held for write on almost all the fast
> path calls to the rlimit_as charge/uncharge functions, using that for
> the synchronization avoids the need for any additional synchronization
> in the fast path.
>
> Can you say more about the complications of holding a write lock on
> mmap_sem in the cleanup calls to uncharge?
>
> > > retry:
> > > mm = get_task_mm(p);
> > > if (mm == NULL) {
> > > task_lock(p);
> > > rcu_assign_ptr(p->cgroups, new_css_set);
> >
> > Will each callback assign p->cgroups to new_css_set?
>
> Yes - but new_css_set will be slightly different for each callback.
> Specifically, it will differ from the existing set pointed to by
> p->cgroups in the pointer for this particular subsystem. So the task
> will move over in a staggered fashion, and each subsystem will get to
> choose its own synchronization.
>
> > > task_lock(p);
> > > if (p->mm != mm) {
> > > /* We raced */
> >
> > With exit_mmap() or exec_mmap() right?
> >
>
> Yes.
>
> > If we agree with the assertion/conclusion above, then a simple lock
> > might be able to protect us, assuming that it does not create a
> > interwined locking hierarchy.
> >
>
> Right - and if we can make that lock be the mmap_sem of the mm in
> question, we avoid introducing a new lock into the fast path.
>

I want to focus on this conclusion/assertion, since it takes care of
most of the locking related discussion above, unless I missed
something.

My concern with using mmap_sem, is that

1. It's highly contended (every page fault, vma change, etc)
2. It's going to make the locking hierarchy deeper and complex
3. It's not appropriate to call all the accounting callbacks with
the mmap_sem() held, since the undo operations _can get_ complicated
at the caller.

I would prefer introducing a new lock, so that other subsystems are
not affected.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/