Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area

From: Eric W. Biederman
Date: Wed Jul 09 2008 - 20:12:12 EST


Mike Travis <travis@xxxxxxx> writes:

> What I meant was using early_printk in place of printk, which seems to stuff the
> messages into the log buf until the serial console is setup fairly late in
> start_kernel.
> I did this by removing printk() and renaming early_printk() to be printk (and a
> couple
> other things like #define early_printk printk ...

Last I looked after the magic early_printk setup. printk calls early_printk
and stuff messages in the log buffer.

It matters little though. As long as you get the print messages. Weird
cases where you don't get into C code worry me much more.

Once you get into C things are much easier to track.

>> Is stack overflow the only problem you are seeing or are there still other
> mysteries?
>
> I'm not entirely sure it's a stack overflow, the fault has a NULL dereference
> and
> then the stack overflow message.

Ok. Interesting.

>>> Only a few of these though I would think might get called early in
>>> the boot, that might also be contributing to the stack overflow.
>>
>> Still the call chain depth shouldn't really be changing. So why should it
>> matter? Ah. The high cpu count is growing cpumask_t so when you put
>> it on the stack. That makes sense. So what stars out as a 4 byte
>> variable on the stack in a normal setup winds up being a 1k variable
>> with 4k cpus.
>
> Yes, it's definitely the three related:
>
> NR_CPUS Patch_Applied THREAD_ORDER Results
> 256 NO 1 works (obviously ;-)
> 256 YES 1 works
> 4096 NO 1 works
> 4096 YES 1 panics
> 4096 YES 3 works (just happened to pick 3,
> 2 probably will work as well.)

> I've been testing NR_CPUS=4096 for quite a while and it's been very
> reliable. It's just weird that this config fails with this new patch
> applied. (default configs and some fairly normal distro configs also
> work fine.) And with the zillion config straws we now have, spotting
> the arbitrary needle is proving difficult. ;-)

Right. Just please split your patch up. It would be good to see
if simply changing the per cpu segment address to 0 is related
to your problem. Or if it the other logic changes necessary to
put the use the pda as a per cpu variable?

I just noticed that we always allocate the pda in the per cpu section.

> One reason I've been sticking with 4.2.4.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/