Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.

From: Liang, Kan
Date: Mon Jan 24 2022 - 11:01:39 EST




On 1/24/2022 7:21 AM, Peter Zijlstra wrote:
On Fri, Jan 21, 2022 at 11:26:44PM -0800, Kyle Huey wrote:
Beginning in Comet Lake, Intel extended the concept of privilege rings to
SMM.[0] A side effect of this is that events caused by execution of code
in SMM are now visible to performance counters with IA32_PERFEVTSELx.USR
set.

rr[1] depends on exact counts of performance events for the user space
tracee, so this change in behavior is fatal for us. It is, however, easily
corrected by setting IA32_DEBUGCTL.FREEZE_WHILE_SMM to 1 (visible in sysfs
as /sys/devices/cpu/freeze_on_smi). While we can and will tell our users to
set freeze_on_smi manually when appropriate, because observing events in
SMM is rarely useful to anyone, we propose to change the default value of
this switch.

+ Andi

From we heard many times from sophisticated customers, they really hate blind spots. They want to see everything. That's why we set freeze_on_smi to 0 as default. I think the patch breaks the principle.

I don't think there is a way to notify all the users that the default kernel value will be changed. (Yes, the end user can always check the /sys/devices/cpu/freeze_on_smi to get the latest value. But in practice, no one checks it unless some errors found.) I think it may bring troubles to the users if they rely on the counts in SMM.

The patch only changes the default values for some platforms, not all platforms. The default value is not consistent among platforms anymore. It can bring confusion.

All in all, we have already exposed an interface for the end-users to change the value. If some apps, e.g., rr, doesn't want the default value, I think they can always change it in the app for all platforms.
We should still keep the freeze_on_smi to 0 as default, which should benefit more users.



In this patch I have assumed that all non-Atom Intel microarchitectures
starting with Comet Lake behave like this but it would be good for someone
at Intel to verify that.


Kan, can you look at that?


I'm asking internally.

Thanks,
Kan

[0] See the Intel white paper "Trustworthy SMM on the Intel vPro Platform"
at https://bugzilla.kernel.org/attachment.cgi?id=300300, particularly the
end of page 5.

[1] https://rr-project.org/

Signed-off-by: Kyle Huey <khuey@xxxxxxxxxxxx>

Patch seems sensible enough; I'll go queue it up unless Kan comes back
with anything troublesome.