Proper locking for find_task_by_pid_ns?

From: Grzegorz Nosek
Date: Mon May 07 2012 - 06:06:05 EST


Hi,

TL;DR: my question is whether this:

read_lock(&tasklist_lock);
rcu_read_lock();
p = find_task_by_pid_ns(pid, &init_pid_ns);
rcu_read_unlock();
read_unlock(&tasklist_lock);

is a valid sequence, or is it deadlockable. My Google-fu says _either_ rcu_read_lock or tasklist_lock is required to traverse the task list[1] but I'm not sure about both at the same time.

The whole story:

I encountered a machine crash with the following output on the console (that's all I got):

http://imgur.com/DzksH

The kernel is 3.2.15 plus Linux-VServer (patch-3.2.11-vs2.3.2.8 [2])

I'm guessing (I admit, I didn't disassemble copy_process) the write lock in question is tasklist_lock (the only clearly visible write_lock_irq() call in this function). So to my untrained eye that would look like some process stuck with tasklist_lock held.

The crash happened while shutting down a vserver, which basically looks like this:

long vs_reboot(unsigned int cmd, void __user *arg)
{
struct vx_info *vxi = current_vx_info();
long ret = 0;

vxdprintk(VXD_CBIT(misc, 5),
"vs_reboot(%p[#%d],%u)",
vxi, vxi ? vxi->vx_id : 0, cmd);

ret = vs_reboot_helper(vxi, cmd, arg);
if (ret)
return ret;

vxi->reboot_cmd = cmd;
if (vx_info_flags(vxi, VXF_REBOOT_KILL, 0)) {
switch (cmd) {
case LINUX_REBOOT_CMD_RESTART:
case LINUX_REBOOT_CMD_HALT:
case LINUX_REBOOT_CMD_POWER_OFF:
vx_info_kill(vxi, 0, SIGKILL);
vx_info_kill(vxi, 1, SIGKILL);
default:
break;
}
}
return 0;
}

/* ... */

int vx_info_kill(struct vx_info *vxi, int pid, int sig)
{
int retval, count = 0;
struct task_struct *p;
struct siginfo *sip = SEND_SIG_PRIV;

retval = -ESRCH;
vxdprintk(VXD_CBIT(misc, 4),
"vx_info_kill(%p[#%d],%d,%d)*",
vxi, vxi->vx_id, pid, sig);
read_lock(&tasklist_lock);
switch (pid) {
case 0:
case -1:
for_each_process(p) {
int err = 0;

if (vx_task_xid(p) != vxi->vx_id || p->pid <= 1 ||
(pid && vxi->vx_initpid == p->pid))
continue;

err = group_send_sig_info(sig, sip, p);
++count;
if (err != -EPERM)
retval = err;
}
break;
case 1:
if (vxi->vx_initpid) {
pid = vxi->vx_initpid;
/* for now, only SIGINT to private init ... */
if (!vx_info_flags(vxi, VXF_STATE_ADMIN, 0) &&
/* ... as long as there are tasks left */
(atomic_read(&vxi->vx_tasks) > 1))
sig = SIGINT;
}
/* fallthrough */
default:
rcu_read_lock();
p = find_task_by_real_pid(pid);
rcu_read_unlock();
if (p) {
if (vx_task_xid(p) == vxi->vx_id)
retval = group_send_sig_info(sig, sip, p);
}
break;
}
read_unlock(&tasklist_lock);
vxdprintk(VXD_CBIT(misc, 4),
"vx_info_kill(%p[#%d],%d,%d,%ld) = %d",
vxi, vxi->vx_id, pid, sig, (long)sip, retval);
return retval;
}

vx_info_kill() is the only place that touches tasklist_lock that does not exist in mainline kernel. find_task_by_real_pid is just find_task_by_pid_ns(pid, &init_pid_ns).

The crash only happened once so far and AFAIK is not easily reproducible. Both the host and the guest were basically completely idle at that time.

So, am I on the right track or should I look elsewhere?

Best regards,
Grzegorz Nosek

1. http://www.mail-archive.com/linuxppc-dev@xxxxxxxxxxxxxxxx/msg57795.html
2. http://vserver.13thfloor.at/Experimental/patch-3.2.11-vs2.3.2.8.diff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/