Re: [PATCH] acpi/pm: If failed at validating ACPI PM timer,inhibit future reads.

From: Konrad Rzeszutek Wilk
Date: Fri Jan 14 2011 - 09:11:54 EST


On Thu, Jan 13, 2011 at 11:15:16PM +0100, Thomas Gleixner wrote:
> On Thu, 13 Jan 2011, Konrad Rzeszutek Wilk wrote:
>
> > tgl, John,
> >
> > Should I push this to Linus or are you guys going to push
> > this patch during this merge window?
>
> Wait a moment. This patch is fresh of the press and not that urgent,
> really.

It is a regression compared to 2.6.37 kernel. I don't know the
urgency requirements for regressions but I figured the earlier the
better.

>
> > I've traced it down to the fact that when we boot under Xen we do
> > not have the HPET enabled nor the ACPI PM timer setup. The
>
> Crap. If Xen would not have setup the pm timer then it would not even
> reach the consistency check. It would simply bail out via

Keep in mind that Linux (under Xen) does see the ACPI PM-Timer at bootup
(it parses the ACPI tables), and when it tries to actually read the
values, so past this point:

>
> if (!pmtmr_ioport)
> return -ENODEV;
>

.. it fails at:
if (i == ACPI_PM_READ_CHECKS) {

and returns -ENODEV. So pmtmr_ioport was still valid at that time.

> and the whole misery would not have happened at all. Though it's a
> Good Thing that Xen is so screwed as it points to a real flaw which
> might happen on real hardware as well. See below
>
> > hpet_enable() is never called (b/c xen_time_init is called), and
> > for calibration of tsc_khz (calibrate_tsc == xen_tsc_khz) we
> > get a valid value.
> >
> > So 'tsc_read_refs' tries to read the ACPI PM timer (acpi_pm_read_early),
> > however that is disabled under Xen:
> >
> > [ 1.099272] calling init_acpi_pm_clocksource+0x0/0xdc @ 1
> > [ 1.140186] PM-Timer failed consistency check (0x0xffffff) - aborting.
> >
> > So the tsc_calibrate_check gets called, it can't do HPET, and reading
> > from ACPI PM timer results in getting 0xffffff.. .. and
> > (0xffff..-0xffff..)/some other value results in div_zero.
>
> Nonsense. 0/(some other value) does not result in a divide by zero
> except "some other value" is zero.

<scratches his head> You are right.
>
> > There is a check in 'tsc_refine_calibration_work' for invalid
> > values:
> >
> > /* hpet or pmtimer available ? */
> > if (!hpet && !ref_start && !ref_stop)
> > goto out;
> >
> > But since ref_start and ref_stop have 0xffffff it does not trigger.
> >
> > This little fix makes the read to be 0 and the check triggers.
>
> First of all the patch disables the pm_timer completely, which happens
> to results in a 0 read as a side effect. But the main point of this

I does not look like a side-effect. Specifically:

static inline u32 acpi_pm_read_early(void)
{
if (!pmtmr_ioport)
return 0;

return acpi_pm_read_verified() & ACPI_PM_MASK;
}

.. ends up taking the !pmtmr_ioport path which is what
tsc_refine_calibration_work has a check for.

> fix is to disable pmtimer in case of failure in the init function
> completely.
>
> Further there are several error conditions in this init function and
> we really need to disable pmtimer for all of them not just for the
> case you encountered.

Good point. What about this patch? John, is it OK if I carry
your Ack-by on this modified patch?

commit 7b530022a59683b3164404ee8ee2d5750748b58b
Author: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Thu Jan 13 12:56:51 2011 -0500

acpi/pm: If failed at validating ACPI PM timer, inhibit future reads.

Without this patch we get:
[ 4.628136] divide error: 0000 [#1] SMP
.. snip..
Pid: 441, comm: kworker/1:1 Not tainted 2.6.37test-05765-gda43a99-dirty #14 N61PB-M2S/N61PB-M2S
[ 4.628150] RIP: e030:[<ffffffff8101109b>] [<ffffffff8101109b>] tsc_refine_calibration_work+0x149/0x1bb
[ 4.628157] RSP: e02b:ffff88008f0d1df0 EFLAGS: 00010246
.. when running Linux under Xen.

I've traced it down to the fact that when we boot under Xen we do
not have the HPET enabled nor the ACPI PM timer setup properly
(it is however detected earlier, so pmtmr_ioport is correct).
The hpet_enable() is never called (b/c xen_time_init is called), and
for calibration of tsc_khz (calibrate_tsc == xen_tsc_khz) we
get a valid value.

So 'tsc_read_refs' tries to read the ACPI PM timer (acpi_pm_read_early),
however that is disabled under Xen:

[ 1.099272] calling init_acpi_pm_clocksource+0x0/0xdc @ 1
[ 1.140186] PM-Timer failed consistency check (0x0xffffff) - aborting.

So the tsc_calibrate_check gets called, it can't do HPET, and reading
from ACPI PM timer results in getting 0xffffff.. which is invalid.

There is a check in 'tsc_refine_calibration_work' for invalid
values:

/* hpet or pmtimer available ? */
if (!hpet && !ref_start && !ref_stop)
goto out;

But since ref_start and ref_stop have 0xffffff it does not trigger.

This little fix does it however.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Acked-by: John Stultz <johnstul@xxxxxxxxxx>

diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index cfb0f52..02e0f28 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -207,12 +207,15 @@ static int __init init_acpi_pm_clocksource(void)
if (i == ACPI_PM_READ_CHECKS) {
printk(KERN_INFO "PM-Timer failed consistency check "
" (0x%#llx) - aborting.\n", value1);
+ pmtmr_ioport = 0;
return -ENODEV;
}
}

- if (verify_pmtmr_rate() != 0)
+ if (verify_pmtmr_rate() != 0) {
+ pmtmr_ioport = 0;
return -ENODEV;
+ }

return clocksource_register_hz(&clocksource_acpi_pm,
PMTMR_TICKS_PER_SEC);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/