Re: [patch] x86: 2.6.31-rc7 crash due to buggy flat_phys_pkg_id

From: Ingo Molnar
Date: Tue Aug 25 2009 - 15:24:55 EST



* Yinghai Lu <yinghai@xxxxxxxxxx> wrote:

> Ingo Molnar wrote:
> > * Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> >> On Tue, 25 Aug 2009, Yinghai Lu wrote:
> >>> initial apic id and apic id could be different.
> >>>
> >>> and we should use initial apic id to get correct phys pkg id in
> >>> case BIOS set crazy apic id.
> >> Yinghai - I think you missed Cyrills' point. Let me repeat it:
> >>
> >> "cpu_has_apic bit turned off"
> >>
> >> there's no apic. No "initial apic id". No "phys pkg id". No
> >> nothing.
> >>
> >> Discussions about "correct phys pkg id" are pointless.
> >
> > that's not the case here though:
> >
> > [ 8.713916] Total of 32 processors activated (162314.96 BogoMIPS).
> >
> > so APICs are active. The real difference is i think this aspect of
> > commit 2759c3287:
> >
> > static int flat_phys_pkg_id(int initial_apic_id, int index_msb)
> > {
> > - return hard_smp_processor_id() >> index_msb;
> > + return initial_apic_id >> index_msb;
> > }
> >
> > We need to revert back to .30 behavior here. (In case of which
> > environment to trust we almost always trust whatever booted millions
> > of Linux boxes in the past already.)
> >
> > Furthermore, commit 2759c3287 did not declare any side-effects and
> > clearly causes a side-effect on vSMP which apparently has an
> > overlapping set of initial APIC ids.
> >
> > Ravikiran, your patch does not do a clear revert of this bit though.
> > If you do a plain revert of the line above alone, does that fix the
> > problem too?
>
> how about patch phys_pkg_id for vsmp?
>
> diff --git a/arch/x86/kernel/apic/probe_64.c b/arch/x86/kernel/apic/probe_64.c
> index f3b1037..65edc18 100644
> --- a/arch/x86/kernel/apic/probe_64.c
> +++ b/arch/x86/kernel/apic/probe_64.c
> @@ -44,6 +44,11 @@ static struct apic *apic_probe[] __initdata = {
> NULL,
> };
>
> +static int apicid_phys_pkg_id(int initial_apic_id, int index_msb)
> +{
> + return hard_smp_processor_id() >> index_msb;
> +}
> +
> /*
> * Check the APIC IDs in bios_cpu_apicid and choose the APIC mode.
> */
> @@ -69,6 +74,11 @@ void __init default_setup_apic_routing(void)
> printk(KERN_INFO "Setting APIC routing to %s\n", apic->name);
> }
>
> + if (is_vsmp_box()) {
> + /* need to update phys_pkg_id */
> + apic->phys_pkg_id = apicid_phys_pkg_id;
> + }

Hm, this is rather tempting simply because it only affects vSMP
systems and we are in late -rc's. Ravikiran, does Yinghai's patch
solve the crash for you too?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/