Re: [RFC PATCH] x86: Filter MSR writes from luserspace

From: Andi Kleen
Date: Tue Nov 26 2019 - 15:15:17 EST


Borislav Petkov <bp@xxxxxxx> writes:
> +int msr_filter_write(u32 reg)
> +{
> + /* all perf counter MSRs */
> + switch (reg) {
> + case MSR_K7_EVNTSEL0:
> + case MSR_K7_PERFCTR0:
> + case MSR_F15H_PERF_CTL:
> + case MSR_F15H_PERF_CTR:
> + case MSR_AMD64_IBSBRTARGET:
> + case MSR_AMD64_IBSCTL:
> + case MSR_AMD64_IBSFETCHCTL:
> + case MSR_AMD64_IBSOPCTL:
> + case MSR_AMD64_IBSOPDATA4:
> + case MSR_AMD64_IBSOP_REG_COUNT:
> + case MSR_AMD64_IBSOP_REG_MASK:
> + case MSR_AMD64_IBS_REG_COUNT_MAX:
> + case MSR_ARCH_PERFMON_EVENTSEL0:
> + case MSR_ARCH_PERFMON_FIXED_CTR0:
> + case MSR_ARCH_PERFMON_FIXED_CTR_CTRL:
> + case MSR_ARCH_PERFMON_PERFCTR0:
> + case MSR_CORE_C1_RES:
> + case MSR_CORE_C3_RESIDENCY:
> + case MSR_CORE_C6_RESIDENCY:
> + case MSR_CORE_C7_RESIDENCY:
> + case MSR_CORE_PERF_GLOBAL_CTRL:
> + case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> + case MSR_CORE_PERF_GLOBAL_STATUS:
> + case MSR_DRAM_ENERGY_STATUS:
> + case MSR_F15H_CU_MAX_PWR_ACCUMULATOR:
> + case MSR_F15H_CU_PWR_ACCUMULATOR:
> + case MSR_F15H_NB_PERF_CTL:
> + case MSR_F15H_PTSC:
> + case MSR_F16H_L2I_PERF_CTL:
> + case MSR_F17H_IRPERF:
> + case MSR_IA32_APERF:
> + case MSR_IA32_DEBUGCTLMSR:
> + case MSR_IA32_DS_AREA:
> + case MSR_IA32_MISC_ENABLE:
> + case MSR_IA32_MPERF:
> + case MSR_IA32_PEBS_ENABLE:
> + case MSR_IA32_PERF_CAPABILITIES:
> + case MSR_IA32_PMC0:
> + case MSR_IA32_RTIT_ADDR0_A:
> + case MSR_IA32_RTIT_ADDR0_B:
> + case MSR_IA32_RTIT_ADDR1_A:
> + case MSR_IA32_RTIT_ADDR1_B:
> + case MSR_IA32_RTIT_ADDR2_A:
> + case MSR_IA32_RTIT_ADDR2_B:
> + case MSR_IA32_RTIT_ADDR3_A:
> + case MSR_IA32_RTIT_ADDR3_B:
> + case MSR_IA32_RTIT_CTL:
> + case MSR_IA32_RTIT_OUTPUT_BASE:
> + case MSR_IA32_RTIT_OUTPUT_MASK:
> + case MSR_IA32_RTIT_STATUS:
> + case MSR_IA32_THERM_STATUS:
> + case MSR_IA32_VMX_MISC:
> + case MSR_KNC_EVNTSEL0:
> + case MSR_KNC_IA32_PERF_GLOBAL_CTRL:
> + case MSR_KNC_IA32_PERF_GLOBAL_OVF_CONTROL:
> + case MSR_KNC_IA32_PERF_GLOBAL_STATUS:
> + case MSR_KNC_PERFCTR0:
> + case MSR_KNL_CORE_C6_RESIDENCY:
> + case MSR_LBR_CORE_FROM:
> + case MSR_LBR_CORE_TO:
> + case MSR_LBR_INFO_0:
> + case MSR_LBR_NHM_FROM:
> + case MSR_LBR_NHM_TO:
> + case MSR_LBR_SELECT:
> + case MSR_LBR_TOS:
> + case MSR_OFFCORE_RSP_0:
> + case MSR_OFFCORE_RSP_1:
> + case MSR_P4_ALF_ESCR0:
> + case MSR_P4_ALF_ESCR1:
> + case MSR_P4_BPU_CCCR0:
> + case MSR_P4_BPU_ESCR0:
> + case MSR_P4_BPU_ESCR1:
> + case MSR_P4_BPU_PERFCTR0:
> + case MSR_P4_BSU_ESCR0:
> + case MSR_P4_BSU_ESCR1:
> + case MSR_P4_CRU_ESCR0:
> + case MSR_P4_CRU_ESCR1:
> + case MSR_P4_CRU_ESCR2:
> + case MSR_P4_CRU_ESCR3:
> + case MSR_P4_CRU_ESCR4:
> + case MSR_P4_CRU_ESCR5:
> + case MSR_P4_DAC_ESCR0:
> + case MSR_P4_DAC_ESCR1:
> + case MSR_P4_FIRM_ESCR0:
> + case MSR_P4_FIRM_ESCR1:
> + case MSR_P4_FLAME_ESCR0:
> + case MSR_P4_FLAME_ESCR1:
> + case MSR_P4_FSB_ESCR0:
> + case MSR_P4_FSB_ESCR1:
> + case MSR_P4_IQ_ESCR0:
> + case MSR_P4_IQ_ESCR1:
> + case MSR_P4_IS_ESCR0:
> + case MSR_P4_IS_ESCR1:
> + case MSR_P4_ITLB_ESCR0:
> + case MSR_P4_ITLB_ESCR1:
> + case MSR_P4_IX_ESCR0:
> + case MSR_P4_IX_ESCR1:
> + case MSR_P4_MOB_ESCR0:
> + case MSR_P4_MOB_ESCR1:
> + case MSR_P4_MS_ESCR0:
> + case MSR_P4_MS_ESCR1:
> + case MSR_P4_PEBS_MATRIX_VERT:
> + case MSR_P4_PMH_ESCR0:
> + case MSR_P4_PMH_ESCR1:
> + case MSR_P4_RAT_ESCR0:
> + case MSR_P4_RAT_ESCR1:
> + case MSR_P4_SAAT_ESCR0:
> + case MSR_P4_SAAT_ESCR1:
> + case MSR_P4_SSU_ESCR0:
> + case MSR_P4_SSU_ESCR1:
> + case MSR_P4_TBPU_ESCR0:
> + case MSR_P4_TBPU_ESCR1:
> + case MSR_P4_TC_ESCR0:
> + case MSR_P4_TC_ESCR1:
> + case MSR_P4_U2L_ESCR0:
> + case MSR_P4_U2L_ESCR1:
> + case MSR_PEBS_FRONTEND:
> + case MSR_PEBS_LD_LAT_THRESHOLD:
> + case MSR_PKG_C10_RESIDENCY:
> + case MSR_PKG_C2_RESIDENCY:
> + case MSR_PKG_C3_RESIDENCY:
> + case MSR_PKG_C6_RESIDENCY:
> + case MSR_PKG_C7_RESIDENCY:
> + case MSR_PKG_C8_RESIDENCY:
> + case MSR_PKG_C9_RESIDENCY:
> + case MSR_PKG_ENERGY_STATUS:
> + case MSR_PLATFORM_ENERGY_STATUS:
> + case MSR_PLATFORM_INFO:
> + case MSR_PP0_ENERGY_STATUS:
> + case MSR_PP1_ENERGY_STATUS:
> + case MSR_PPERF:
> + case MSR_RAPL_POWER_UNIT:
> + case MSR_SMI_COUNT:
> + return -EPERM;


This will break a bazillion of tools that rely on programing many of
those MSRs from user space. Just do a search on github for all the MSR
names you're blocking. PMU use from ring 3 is a fairly common use case.

just from the top of my head:

https://github.com/opcm/pcm
https://github.com/RRZE-HPC/likwid
... likely many more ...

There were patches in the past to allow a user configurable list.
Something like that can make sense for some use models.

But a hard coded list is just a terrible bad idea.

For a "locked down" kernel the only case that can make sense
is to block write to MSRs that control memory writes. But that's only
a small fraction of the MSRs (afaik only DS_BASE and RTIT_OUTPUT*
from your list, but there are more I believe)

Also if there are blocks they would need to be model specific,
e.g. the P4_* MSRs often mean something different on other parts.

-Andi