Re: perf hw in kexeced kernel broken in tip

From: Yinghai Lu
Date: Tue Dec 07 2010 - 19:27:24 EST


On 12/07/2010 01:16 PM, Don Zickus wrote:
> On Thu, Dec 02, 2010 at 08:34:30AM +0100, Peter Zijlstra wrote:
>>> void __init lockup_detector_init(void)
>>> {
>>> void *cpu = (void *)(long)smp_processor_id();
>>> @@ -563,6 +576,7 @@ void __init lockup_detector_init(void)
>>>
>>> cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
>>> register_cpu_notifier(&cpu_nfb);
>>> + register_reboot_notifier(&reboot_nfb);
>>>
>>> return;
>>> }
>>
>> We'd really want a perf_event.c callback there to do as the hot-unplug
>> code does and detach all running counters from the cpu.
>
> Ok, here is a simpler patch for now.
>
> --------------------------------8<--------
> From: Don Zickus <dzickus@xxxxxxxxxx>
> Date: Tue, 7 Dec 2010 16:06:59 -0500
> Subject: [PATCH] perf: Use event select bits for hardware check
>
> The counter registers can continue to increment if left enabled
> across a kexec or a kdump. The makes the perf hardware check
> accidentally return false when the hardware really does exist.
>
> Change the check to use the first bits of event selection. Those
> bits should be safe as they are used to program the type of events
> to use. And more importantly, they won't increment across kexec/kdump.
>
> Signed-off-by: Don Zickus <dzickus@xxxxxxxxxx>
> ---
> arch/x86/kernel/cpu/perf_event.c | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 7b91396..7d869c0 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -377,10 +377,10 @@ static bool check_hw_exists(void)
> u64 val, val_new = 0;
> int ret = 0;
>
> - val = 0xabcdUL;
> - ret |= checking_wrmsrl(x86_pmu.perfctr, val);
> - ret |= rdmsrl_safe(x86_pmu.perfctr, &val_new);
> - if (ret || val != val_new)
> + val = 0xabUL;
> + ret |= checking_wrmsrl(x86_pmu.eventsel, val);
> + ret |= rdmsrl_safe(x86_pmu.eventsel, &val_new);
> + if (ret || val != (val_new & 0xFF))
> return false;
>
> return true;

Thanks. it fixes the problem.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/