Re: perf sched record hangs machine

From: Cyrill Gorcunov
Date: Wed Sep 23 2009 - 05:48:59 EST


On 9/23/09, Ingo Molnar <mingo@xxxxxxx> wrote:
>
> * Chris Malley <mail@xxxxxxxxxxxxxxxxx> wrote:
>
>> 2009/9/23 Cyrill Gorcunov <gorcunov@xxxxxxxxx>:
>> >
>> > Btw, meanwhile Chris may try to pass lapic boot-option in attempt to
>> > reenable apic via msr registers. Also (iirc) i feel we may be hiding
>> > errors if complete noop apic would be used since i belive we need to
>> > check out under which condition a particular operation is called and
>> > when apic is disabled it's mean we're switched to UP mode and
>> > inter-cpu interrupts are under suspicion too. Will take a look during
>> > ~6 hours ;)
>> >
>>
>> Hi Cyrill
>>
>> Heh, yes that just occurred to me as well. With the lapic boot option
>> I can't reproduce the problem, and get a good recording every time.
>> Don't know why the BIOS had disabled it (can't see any specific
>> option).
>
> Would still be important to fix the crash - there are boxes where lapics
> are disabled permanently and cannot be re-enabled. (plus most people
> dont touch their defaults and dont add funky boot options - so crashing
> is not an option)
>

Ingo, Chris, could you try Peter's patch? It seems like what we need.

(Peter, self-ipi shouldn't be separated from others ipi, yes it may
not issue any cycle on fsb, but iirc it uses the same logic as other
ipi use)

> I have such a test-box:
>
> [ 0.000000] Using APIC driver default
> [ 0.000000] ACPI: PM-Timer IO Port: 0x8008
> [ 0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
> [ 0.000000] Local APIC disabled by BIOS -- reenabling.
> [ 0.000000] Could not enable APIC!
> [ 0.000000] APIC: disable apic facility
>
> Btw., perf events can work even without a lapic (albeit without NMI
> driven sampling):
>
> [ 0.052051] Performance Events:
> [ 0.055138] no APIC, boot with the "lapic" boot parameter to force-enable
> it.
> [ 0.056014] no hardware sampling interrupt available.
> [ 0.060014] p6 PMU driver.
> [ 0.062955] ... version: 0
> [ 0.064014] ... bit width: 32
> [ 0.068014] ... generic registers: 2
> [ 0.072015] ... value mask: 00000000ffffffff
> [ 0.076014] ... max period: 000000007fffffff
> [ 0.080014] ... fixed-purpose events: 0
> [ 0.084014] ... event mask: 0000000000000003
>
> That's what it did on your box too:
>
> [ 0.013679] Performance Events:
> [ 0.013705] no APIC, boot with the "lapic" boot parameter to force-enable
> it.
> [ 0.013783] no hardware sampling interrupt available.
> [ 0.013826] p6 PMU driver.
> [ 0.013882] ... version: 0
> [ 0.013922] ... bit width: 32
> [ 0.013962] ... generic registers: 2
> [ 0.014002] ... value mask: 00000000ffffffff
> [ 0.014045] ... max period: 000000007fffffff
> [ 0.014088] ... fixed-purpose events: 0
> [ 0.014128] ... event mask: 0000000000000003
>
> Unfortunately i cannot reproduce the crash you've been seeing. (but i'm
> quite sure it's due to self-IPI not working fine with dummy lapic.)
>
> Ingo
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/