On Mon, Mar 26, 2018 at 11:37:25AM -0700, Matthew Wilcox wrote:
On Tue, Mar 27, 2018 at 02:20:39AM +0800, Yang Shi wrote:Say we've two syscalls running prctl_set_mm_map in parallel, and imagine
+++ b/kernel/sys.cI see the argument for the change to a write lock was because of a BUG
@@ -1959,7 +1959,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
return error;
}
- down_write(&mm->mmap_sem);
+ down_read(&mm->mmap_sem);
/*
* We don't validate if these members are pointing to
@@ -1980,10 +1980,13 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
mm->start_brk = prctl_map.start_brk;
mm->brk = prctl_map.brk;
mm->start_stack = prctl_map.start_stack;
+
+ spin_lock(&mm->arg_lock);
mm->arg_start = prctl_map.arg_start;
mm->arg_end = prctl_map.arg_end;
mm->env_start = prctl_map.env_start;
mm->env_end = prctl_map.env_end;
+ spin_unlock(&mm->arg_lock);
/*
* Note this update of @saved_auxv is lockless thus
validating arg_start and arg_end, but more generally, we are updating these
values, so a write-lock is probably a good idea, and this is a very rare
operation to do, so we don't care about making this more parallel. I would
not make this change (but if other more knowledgable people in this area
disagree with me, I will withdraw my objection to this part).
one have @start_brk = 20 @brk = 10 and second caller has @start_brk = 30
and @brk = 20. Since now the call is guarded by _read_ the both calls
unlocked and due to OO engine it may happen then when both finish
we have @start_brk = 30 and @brk = 10. In turn "write" semaphore
has been take to have consistent data on exit, either you have [20;10]
or [30;20] assigned not something mixed.
That said I think using read-lock here would be a bug.
Cyrill