Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.

From: Liang, Kan
Date: Tue Jan 25 2022 - 09:06:16 EST




On 1/24/2022 9:59 PM, Kyle Huey wrote:
On Mon, Jan 24, 2022 at 8:01 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:



On 1/24/2022 7:21 AM, Peter Zijlstra wrote:
On Fri, Jan 21, 2022 at 11:26:44PM -0800, Kyle Huey wrote:
Beginning in Comet Lake, Intel extended the concept of privilege rings to
SMM.[0] A side effect of this is that events caused by execution of code
in SMM are now visible to performance counters with IA32_PERFEVTSELx.USR
set.

rr[1] depends on exact counts of performance events for the user space
tracee, so this change in behavior is fatal for us. It is, however, easily
corrected by setting IA32_DEBUGCTL.FREEZE_WHILE_SMM to 1 (visible in sysfs
as /sys/devices/cpu/freeze_on_smi). While we can and will tell our users to
set freeze_on_smi manually when appropriate, because observing events in
SMM is rarely useful to anyone, we propose to change the default value of
this switch.

+ Andi

From we heard many times from sophisticated customers, they really hate
blind spots. They want to see everything. That's why we set
freeze_on_smi to 0 as default. I think the patch breaks the principle.

The default kernel settings for perf events prioritize preventing
information leaks to less privileged code. perf_event_paranoid
defaults to 2, preventing unprivileged users from observing kernel
space. If "sophisticated customers" want to see everything they have
already needed privileges (or an explicit opt-in through decreasing
perf_event_paranoid) for some time.

The current situation on Comet Lake+ where an unprivileged user
*cannot* observe kernel code due to security concerns but
simultaneously *must* observe SMM code seems rather absurd.


I see. I was thought the unprivileged user can observe the SMM code on the previous platforms. The CML+ change only makes part of the SMM code CPL0. Seems I'm wrong. The change looks like changing the previous CPL0 code to CPL3 code. If so, yes, I think we should prevent the information leaks for the unprivileged user.

I don't think there is a way to notify all the users that the default
kernel value will be changed. (Yes, the end user can always check the
/sys/devices/cpu/freeze_on_smi to get the latest value. But in practice,
no one checks it unless some errors found.) I think it may bring
troubles to the users if they rely on the counts in SMM.

Unfortunately the new hardware has already changed the behavior
without notifying users, no matter what we do here.

The patch only changes the default values for some platforms, not all
platforms. The default value is not consistent among platforms anymore.
It can bring confusion.

I don't personally object to changing freeze_on_smi for all platforms
:) I was merely trying to limit the changes.


Changing it to all platforms seems a too big hammer. I agree we should limit it to the impacted platforms.

I've contacted the author of the white paper. I was told that the change is for the client vPro platforms. They are not sure whether it impacts Server platform or Atom platforms. I'm still working on it. I will let you and Peter know once I get more information.

Thanks,
Kan