[PATCH 0/2] suck some poison out of cgroups' linked lists

From: Phil Carmody
Date: Tue Mar 15 2011 - 09:14:40 EST



I recently saw cgroup_attach_task drop this bomb:
[ 46.045806] Unable to handle kernel paging request at virtual address 00200200
Which is clearly linked-list poison.

Dereferencing 00100104 has also been seen nearby according to a quick
web-search that I did.

Apparently, whether nodes are on a list is being checked with list_empty(),
and if they're on a list, they're list_del()ed. According to a subsequent
list_empty() check, they're still on a list, as list_del() doesn't turn
the nodes into singleton lists, it simply poisons both its pointers, and
merry poison dereferencing may ensue. Oops.

There are at least 2 to address this matter, I've gone for the latter:

1) Do not use list_empty() to check if a node is on a list or not. Have
an additional new function that checks to see whether the node is either
a singleton or is poisoned. Something like list_node_{on,off}_list()?

2) Ensure that you never leave poison anywhere where you might want
to use list_empty().

It might be that these oopses are seen only because there's a marginal
race in the cgroups code, as they seem to be very rare. In that case
this patchset might not fix the core problem, but might simply hide it.
Someone with more cgroups expertise might want to investigate that
possibility.

Patch 1 is the "hindsight is 20/20" patch which would have made
identifying the issue trivial.

Cheers,
Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/