Re: [RFC] perf/x86/intel: Disable check_msr for real hw

From: Vaden, Tom (HPE Server OS Architecture)
Date: Sun Jun 23 2019 - 22:10:20 EST




On 6/23/19 6:40 PM, Jiri Olsa wrote:
> On Fri, Jun 21, 2019 at 10:48:25AM -0700, Andi Kleen wrote:
>> On Fri, Jun 14, 2019 at 01:28:53PM +0200, Jiri Olsa wrote:
>>> hi,
>>> the HPE server can do POST tracing and have enabled LBR
>>> tracing during the boot, which makes check_msr fail falsly.
>>>
>>> It looks like check_msr code was added only to check on guests
>>> MSR access, would it be then ok to disable check_msr for real
>>> hardware? (as in patch below)
>>>
>>> We could also check if LBR tracing is enabled and make
>>> appropriate checks, but this change is simpler ;-)
>>>
>>> ideas? thanks,
>>> jirka
>>
>> Sorry for the late comment. I see this patch has been merged now.
>>
>> Unfortunately I don't think it's a good idea. The problem
>> is that the hypervisor flags are only set for a few hypervisors
>> that Linux knows about. But in practice there are many more
>> Hypervisors around that will not cause these flags to be set.
>> But these are still likely to miss MSRs.
>>
>> The other hypervisors are relatively obscure, but eventually
>> someone will hit problems.
>
> any idea if there's any other flag/way we could use to detect those?
>
> adding few virtualization folks to the loop
> and attaching the original patch
>
> thanks,
> jirka
>
Would it be reasonable to change the sense of the original patch in
commit 4550931 to only enable the check for the set of "certain hardware
emulators" and leave the check otherwise disabled by default? I'm
assuming that set is known (and small)?

>
> ---
> Tom Vaden reported false failure of check_msr function, because
> some servers can do POST tracing and enable LBR tracing during
> the boot.
>
> Kan confirmed that check_msr patch was to fix a bug report in
> guest, so it's ok to disable it for real HW.
>
> Cc: Kan Liang <kan.liang@xxxxxxxxx>
> Reported-by: Tom Vaden <tom.vaden@xxxxxxx>
> Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>
> ---
> arch/x86/events/intel/core.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 71001f005bfe..1194ae7e1992 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -20,6 +20,7 @@
> #include <asm/intel-family.h>
> #include <asm/apic.h>
> #include <asm/cpu_device_id.h>
> +#include <asm/hypervisor.h>
>
> #include "../perf_event.h"
>
> @@ -4050,6 +4051,13 @@ static bool check_msr(unsigned long msr, u64 mask)
> {
> u64 val_old, val_new, val_tmp;
>
> + /*
> + * Disable the check for real HW, so we don't
> + * mess up with potentionaly enabled regs.
> + */
> + if (hypervisor_is_type(X86_HYPER_NATIVE))
> + return true;
> +
> /*
> * Read the current value, change it and read it back to see if it
> * matches, this is needed to detect certain hardware emulators
>