Re: [PATCH 2/6] mempolicy: remove all mempolicy sharing

From: KOSAKI Motohiro
Date: Tue Jun 12 2012 - 12:46:09 EST


> Your example is missing some important detail. When I was looking at this
> I thought of the same scenario because initially I thought this might be
> the problem Dave's test case was hitting. Obviously I then proceeded to
> mess up anyway so take this with a grain of salt but why is this particular
> situation not prevented by vma_merge? is_mergeable_vma() should have spotted
> that the vm_files differed and mbind_range() should not have tried
> sharing them.

vma1 and vma2 are never merged. but policy_vma() used mpol_get() instaed
of mpol_dup(). then vma1 and vma2 became to use the same mempolicy.

vma merge/split are completely unrelated. Antually, vma1 and vma2 don't need
to be neighbor vma. | vma1 | hole | vma2| pattern makes the same scenario.


>> Look at alloc_pages_vma(), it uses get_vma_policy() and mpol_cond_put() pair
>> for maintaining mempolicy refcount. The current rule is, get_vma_policy() does
>> NOT increase a refcount if the policy is not attached shmem vma and mpol_cond_put()
>> DOES decrease a refcount if mpol has MPOL_F_SHARED.
>
> The rules about refcounting are indeed annoying. It would be a lot easier
> to understand if the reference counting was unconditional but then every
> page allocation in a large VMA would also bounce the cacheline storing
> the count which would just generate a new bug later.

Yes. regular task/vma policy shouldn't take refcount in fast path. In the other
hands, shmem policy can't avoid refcount game because we have to avoid a
race that another thread free the policy in same time.


> I suspect these bugs were not noticed because the shmem policies are
> typically large and very long lived without much use of mbind() but
> that's not an excuse.

I agree your suspection. I haven't heared this issue.



>> -/* Apply policy to a single VMA */
>> -static int policy_vma(struct vm_area_struct *vma, struct mempolicy *new)
>> +/*
>> + * Apply policy to a single VMA
>> + * This must be called with the mmap_sem held for writing.
>> + */
>> +static int policy_vma(struct vm_area_struct *vma, struct mempolicy *pol)
>
> If we're going to change this, change the policy_vma() name as well to
> set_vma_policy. We currently have policy_vma() and vma_policy() which mean
> totally different things which is partially why I deleted it entirely the
> first time around. It's a small issue but it might make mempolicy.c 0.0001%
> easier to follow.

100% agree. I'll make simple renaming patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/