Re: [PATCH 45/79] [PATCH] fix apic acking of irqs
From: Glauber Costa
Date: Mon Mar 24 2008 - 10:52:46 EST
Maciej W. Rozycki wrote:
On Thu, 20 Mar 2008, Glauber Costa wrote:
Are you sure this actually triggers for APIC chips affected by the erratum
in question? And please note that for them the effect of two consecutive
writes will be much more disastrous than setting a bit in the ESR register.
I'm not _sure_, but I can't find anything in the errata list that states
otherwise. It would be great that anyone has such a system to test it. But
with the current conditions, it will break bootup code. In case it is really a
problem, we'd need to make a special case for that.
I have dug out the relevant erratum -- it is the 11AP one as referred to
from arch/x86/kernel/smp_32.c and the text even mentions the EOI register
explicitly:
"This problem affects systems that use HOLD/HLDA or BOFF# and enable the
local APIC of the CPU. If the second APIC write cycle is an EOI (End of
Interrupt) cycle, the CPU will stop servicing subsequent interrupts of
equal or less priority. This may cause the system to hang. If the second
APIC write cycle is not an EOI, the failure mode would depend on the
particular APIC register that is not updated correctly."
But on this occasion I took the opportunity to refresh my memory on the
ESR register and there is apparently no bit there, at least up to
Pentium4, that would signify an error resulting from an incorrect access
type -- only accesses to invalid register indices are marked as errors.
Which bit of the ESR can you see set as a result of using an RMW cycle to
the EOI register and with what kind of CPU/APIC? And why wouldn't it have
affected older kernels? -- the error interrupt has been kept enabled by
Linux for ages and writes to the EOI register are frequent enough it would
be hard to miss the resulting flood of errors. Hmm...
I see bit 7 - Illegal Register Address being set.
I believe the reason we never saw it, is that the ESR register is not
checked that often when interrupts are enabled. In the new bootup state
machine, that is inherited from x86_64, we call do_boot_cpu with irqs
clearly enabled, and check esr in the process.
But I can understand from the spec you posted that this is clearly an
error. So I'd have better come up with a new solution from this
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/