Re: [regression] 3.0-rc boot failure -- bisected to cd4ea6ae3982

From: Anton Blanchard
Date: Wed Jul 13 2011 - 20:34:24 EST



Hi Peter,

> Surely this isn't the first multi-node P7 to boot a kernel with this
> patch? If my git foo is any good it hit -next on 23rd of May.
>
> I guess I'm asking is, do smaller P7 machines boot? And if so, is
> there any difference except size?
>
> How many nodes does the thing have anyway, 28? Hmm, that could mean
> its the first machine with >16 nodes to boot this, which would make it
> trigger the magic ALL_NODES crap.

We haven't tested a box with more than 16 nodes in quite a while, so it
may be this.

I took a quick look and we are stuck in update_group_power:

do {
power += group->cpu_power;
group = group->next;
} while (group != child->groups);

I looked at the linked list:

child->groups = c000007b2f74ff00

and dumping group as we go:

c000007b2f74ff00 c000007b2f760000 c000007b2fb60000 c000007b2ff60000

at this point we end up in a cycle and never make it back to
child->groups:

c000008b2e68ff00 c000008b2e6a0000 c000008b2eaa0000 c000008b2eea0000
c000009aee77ff00 c000009aee790000 c000009aeeb90000 c000009aeef90000
c00000bafde91800 c00000dafdf81800 c00000fafce81800 c000011afdf71800
c00001226e70ff00 c00001226e720000 c00001226eb20000 c00001226ef20000
c000008b2e68ff00

Still investigating

Anton

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/