Re: [PATCH 2/2] CRED: Fix __task_cred()'s lockdep check and banner comment

From: Eric W. Biederman
Date: Thu Aug 05 2010 - 03:19:43 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> Added Oleg and Thomas to the participants.
>
> Oleg/Thomas: the whole thread is on lkml, but I'm quoting most of the
> relevant parts..
>
> On Tue, Aug 3, 2010 at 2:34 AM, David Howells <dhowells@xxxxxxxxxx> wrote:
>>
>> A previous patch:
>>
>> Â Â Â Âcommit 8f92054e7ca1d3a3ae50fb42d2253ac8730d9b2a
>> Â Â Â ÂAuthor: David Howells <dhowells@xxxxxxxxxx>
>> Â Â Â ÂDate: Â Thu Jul 29 12:45:55 2010 +0100
>> Â Â Â ÂSubject: CRED: Fix __task_cred()'s lockdep check and banner comment
>>
>> fixed the lockdep checks on __task_cred(). ÂThis has shown up a place in the
>> signalling code where a lock should be held - namely that
>> check_kill_permission() requires its callers to hold the RCU lock.
>
> It's not just check_kill_permission(), is it? I thought we could do
> the "for_each_process()" loops with just RCU, rather than holding the
> whole tasklist_lock? So I _think_ that getting the RCU read-lock would
> make it possible to get rid of the tasklist_lock in there too? At
> least in kill_something_info().
>
> Yes/no? What am I missing? This is an Oleg question, mainly.

No. When we send a signal to multiple processes it needs to be an
atomic operation so that kill -KILL -pgrp won't let processes escape.
It is what posix specifies, it is what real programs expect, and it
is the useful semantic in userspace.

Just using rcu_read_lock() breaks that atomicity of sending a signal
to a process group, which means we can have new processes that escape
the signal. Even with the tasklist_lock we have to have a special
case in fork to ensure we don't add a process to a process group
while a signal is being delivered to it.

I have scratched my head a few times wondering if there is a way to
preserve the atomic guarantee without taking a global lock, but I
haven't found one I can wrap my head around yet.

>> Â Â Â Âsuccess = 0;
>> Â Â Â Âretval = -ESRCH;
>> + Â Â Â rcu_read_lock();
>> Â Â Â Âdo_each_pid_task(pgrp, PIDTYPE_PGID, p) {
>> Â Â Â Â Â Â Â Âint err = group_send_sig_info(sig, info, p);
>> Â Â Â Â Â Â Â Âsuccess |= !err;
>> Â Â Â Â Â Â Â Âretval = err;
>> Â Â Â Â} while_each_pid_task(pgrp, PIDTYPE_PGID, p);
>> + Â Â Â rcu_read_unlock();
>
> Ok, makes sense. At the same time, I do wonder if we shouldn't perhaps
> do this in the caller. The callers would seem to all want it - look at
> kill_something_info(), for example, where it really looks like it
> would be nicer to put that _whole_ function under rcu_read_lock() (in
> the caller itself). Or in kernel/exit.c, we have two calls
> back-to-back, so doing it in the caller would clean that up and do it
> just once.

Likely. At one point read_lock(&tasklist_lock) implied having the
rcu_read_lock(). As it no longer does in some cases we have a small
pile of places with outdated assumptions. I know I have been guilty a
few times of forgetting about the new rule that we have to take rcu_read_lock()
everywhere.

> Look here:
>
>> @@ -1261,6 +1263,7 @@ static int kill_something_info(int sig, struct siginfo *info, pid_t pid)
>> Â Â Â Â Â Â Â Âint retval = 0, count = 0;
>> Â Â Â Â Â Â Â Âstruct task_struct * p;
>>
>> + Â Â Â Â Â Â Â rcu_read_lock();
>> Â Â Â Â Â Â Â Âfor_each_process(p) {
>> Â Â Â Â Â Â Â Â Â Â Â Âif (task_pid_vnr(p) > 1 &&
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â!same_thread_group(p, current)) {
>> @@ -1270,6 +1273,7 @@ static int kill_something_info(int sig, struct siginfo *info, pid_t pid)
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âretval = err;
>> Â Â Â Â Â Â Â Â Â Â Â Â}
>> Â Â Â Â Â Â Â Â}
>> + Â Â Â Â Â Â Â rcu_read_unlock();
>> Â Â Â Â Â Â Â Âret = count ? retval : -ESRCH;
>> Â Â Â Â}
>> Â Â Â Âread_unlock(&tasklist_lock);
>
> with those fragments, we now end up having rcu_read_lock() for _all_
> the three cases in kill_something_info(), but rather than do it once,
> we do it explicitly, and we do it in two different ways (two cases do
> it explicitly, the middle one does it implicitly when it calls
> __kill_pgrp_info().
>
> So I think the patch is correct, but at the same time I do get the
> feeling that we should just do it slightly differently.
>
> Comments?

With the tasklist_lock the rule in these functions is that the caller
will take the lock, so we probably make the rule the caller should
take the lock in the same scenarios for the rcu_read_lock(). Aka just
say:

read_lock(&tasklist_lock);
rcu_read_lock();

everywhere, that today we say just:

read_lock(&tasklist_lock);

It is what was meant when the code was written and the rcu-ized, and
it will certainly preserve my human intuition and general practical
reality that when you have the tasklist_lock you have the
rcu_read_lock.

*Shrug* I don't think it matters a whole lot.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/