Re: [PATCH] x86/perf: Default freeze_on_smi on for Comet Lake and later.

From: Andrew Cooper
Date: Wed Jan 26 2022 - 21:29:42 EST


On 22/01/2022 07:26, Kyle Huey wrote:
> Beginning in Comet Lake, Intel extended the concept of privilege rings to
> SMM.[0]

SMM has always has full access to all 4 rings of protection.

Blame anyone who uses the terms "Ring -1/-2", because they are horribly
misleading terms and show a fundamental misunderstanding of how this works.

On entry to SMM, the processors is in Real Mode, which is CPL0.  Pretty
much every handler switches to Protected Mode as soon as possible
(because programming for Unreal Mode is evil).  UEFI systems with 64bit
firmware will set up pagetables and switch into Long mode.

But from a code organisation point of view, SMM has traditionally been a
RWX free-for-all with more-than-kernel privileges and all the input
handling gotchas/etc.  It's no wonder that SMM is a fertile source of
security issues.

> A side effect of this is that events caused by execution of code
> in SMM are now visible to performance counters with IA32_PERFEVTSELx.USR
> set.

What is new in Comet Lake is a kernel running in SMM CPL0 which sets up
usermode to run the main logic.

However, if e.g. an enterprising Coreboot developer were to decide that
this CPL3 SMM plan might be a good idea, older CPUs would start
manifesting the same behaviour.

> rr[1] depends on exact counts of performance events for the user space
> tracee, so this change in behavior is fatal for us. It is, however, easily
> corrected by setting IA32_DEBUGCTL.FREEZE_WHILE_SMM to 1 (visible in sysfs
> as /sys/devices/cpu/freeze_on_smi). While we can and will tell our users to
> set freeze_on_smi manually when appropriate, because observing events in
> SMM is rarely useful to anyone, we propose to change the default value of
> this switch.

Frankly, it is an error that FREEZE_WHILE_SMM is under the kernels
control, and not SMM's control.  After all, it's SMM handling all the
UEFI secrets/etc.

Linux ought to set FREEZE_WHILE_SMM unilaterally, because most kernel
profiling probably won't want interference from SMM.  Root can always
disable FREEZE_WHILE_SMM if profiling is really wanted.

I'm not sure if anything can be done on pre-FREEZE_WHILE_SMM CPUs.  Nor
AMD CPUs which are also gaining CPL3 SMM logic, and don't appear to have
any equivalent functionality.

~Andrew