Re: 3.10.16 cgroup_mutex deadlock

From: Michal Hocko
Date: Tue Nov 12 2013 - 09:31:54 EST


On Tue 12-11-13 18:17:20, Li Zefan wrote:
> Cc more people
>
> On 2013/11/12 6:06, Shawn Bohrer wrote:
> > Hello,
> >
> > This morning I had a machine running 3.10.16 go unresponsive but
> > before we killed it we were able to get the information below. I'm
> > not an expert here but it looks like most of the tasks below are
> > blocking waiting on the cgroup_mutex. You can see that the
> > resource_alloca:16502 task is holding the cgroup_mutex and that task
> > appears to be waiting on a lru_add_drain_all() to complete.

Do you have sysrq+l output as well by any chance? That would tell
us what the current CPUs are doing. Dumping all kworker stacks
might be helpful as well. We know that lru_add_drain_all waits for
schedule_on_each_cpu to return so it is waiting for workers to finish.
I would be really curious why some of lru_add_drain_cpu cannot finish
properly. The only reason would be that some work item(s) do not get CPU
or somebody is holding lru_lock.

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/