Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpuarea

From: Mike Travis
Date: Wed Jul 09 2008 - 19:30:30 EST


Eric W. Biederman wrote:
> Mike Travis <travis@xxxxxxx> writes:
>
... (I have been using the trick
>> to replace printk with early_printk so messages come out immediately instead
>> of from the log buf.)
>
> Just passing early_printk=xxx on the command line should have that effect.
> Although I do admit you have to be a little bit into the boot before early_printk
> is setup.

What I meant was using early_printk in place of printk, which seems to stuff the
messages into the log buf until the serial console is setup fairly late in start_kernel.
I did this by removing printk() and renaming early_printk() to be printk (and a couple
other things like #define early_printk printk ...

>
>> I've been able to make some more progress. I've gotten to a point where it
>> panics from stack overflow. I've verified this by bumping THREAD_ORDER and
>> it boots fine. Now tracking down stack usages. (I have found a couple of new
>> functions using set_cpus_allowed(..., CPU_MASK_ALL) instead of
>> set_cpus_allowed_ptr(... , CPU_MASK_ALL_PTR). But these are not in the calling
>> sequence so subsequently are not the cause.
>
> Is stack overflow the only problem you are seeing or are there still other mysteries?

I'm not entirely sure it's a stack overflow, the fault has a NULL dereference and
then the stack overflow message.

>
>> One weird thing is early_idt_handler seems to have been called and that's one
>> thing our simulator does not mimic for standard Intel FSB systems - early
>> pending
>> interrupts. (It's designed after all to mimic our h/w, and of course it's been
>> booting fine under that environment.)
>
> That usually indicates you are taking an exception during boot not that you
> have received an external interrupt. Something like a page fault or a
> division by 0 error.

I was thinking maybe an RTC interrupt? But a fault does sound more likely.

>
>> Only a few of these though I would think might get called early in
>> the boot, that might also be contributing to the stack overflow.
>
> Still the call chain depth shouldn't really be changing. So why should it
> matter? Ah. The high cpu count is growing cpumask_t so when you put
> it on the stack. That makes sense. So what stars out as a 4 byte
> variable on the stack in a normal setup winds up being a 1k variable
> with 4k cpus.

Yes, it's definitely the three related:

NR_CPUS Patch_Applied THREAD_ORDER Results
256 NO 1 works (obviously ;-)
256 YES 1 works
4096 NO 1 works
4096 YES 1 panics
4096 YES 3 works (just happened to pick 3,
2 probably will work as well.)

> Reasonable. The practical problem is you are mixing a lot of changes
> simultaneously and it confuses things. Compiling with NR_CPUS=4096
> and working out the bugs from a growing cpumask_t, putting the per cpu
> area in a zero based segment, and putting putting the pda into the
> per cpu area all at the same time.

I've been testing NR_CPUS=4096 for quite a while and it's been very
reliable. It's just weird that this config fails with this new patch
applied. (default configs and some fairly normal distro configs also
work fine.) And with the zillion config straws we now have, spotting
the arbitrary needle is proving difficult. ;-)

> Who knows maybe the only difference between 4.2.0 and 4.2.4 is that
> 4.2.4 optimizes it's stack usage a little better and you don't see
> a stack overflow.

I haven't tried the THREAD_ORDER=3 (or 2) under 4.2.0, but that would
seem to indicate this may be true.

> It would be very very good if we could separate out these issues
> especially the segment for the per cpu variables. We need something
> like that.

One reason I've been sticking with 4.2.4.

Thanks again for your help.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/