Re: [RFC 00/10] Introduce In Field Scan driver
From: Andy Lutomirski
Date: Wed Mar 02 2022 - 09:00:32 EST
On Tue, Mar 1, 2022, at 11:54 AM, Jithu Joseph wrote:
> Note to Maintainers:
> Requesting x86 Maintainers to take a look at patch01 as it
> touches arch/x86 portion of the kernel. Also would like to guide them
> to patch07 which sets up hotplug notifiers and creates kthreads.
>
> Patch 2/10 - Adds Documentation. Requesting Documentation maintainer to
> review it.
>
> Requesting Greg KH to review the sysfs changes added by patch08.
>
> Patch10 adds tracing support, requesting Steven Rostedt to review that.
>
> Rest of the patches adds the IFS platform driver, requesting Platform
> driver maintainers
> to review them.
>
>
> In Field Scan (IFS) is a hardware feature to run circuit level tests on
> a CPU core to detect problems that are not caught by parity or ECC checks.
>
> Intel will provide a firmware file containing the scan tests. Similar to
> microcode there is a separate file for each family-model-stepping. The
> tests in the file are divided into some number of "chunks" that can be
> run individually.
>
> The driver loads the tests into memory reserved BIOS local to each CPU
> socket in a two step process using writes to MSRs to first load the
> SHA hashes for the test. Then the tests themselves. Status MSRs provide
> feedback on the success/failure of these steps.
>
> Tests are run by synchronizing execution of all threads on a core and
> then writing to the ACTIVATE_SCAN MSR on all threads. Instruction
> execution continues when:
>
> 1) all tests have completed
> 2) execution was interrupted
> 3) a test detected a problem
>
> In all cases reading the SCAN_STATUS MSR provides details on what
> happened. Interrupted tests may be restarted.
>
> The IFS driver provides interfaces from /sys to reload tests and to
> control execution:
>
> /sys/devices/system/cpu/ifs/reload
> Writing "1" to this file will reload the tests from
> /lib/firmware/intel/ifs/{ff-mm-ss}.scan
IMO this interface is wrong. /lib/firmware is for firmware (or ucode, etc) files that should be provided by a distribution and loaded, as needed, by a driver so the hardware can function. This is not at all what IFS does. For IFS, an administrator wants to run a specific test, and the test blob is part of the instruction to run the test. The distribution should not be involved, and this should work even on systems where /lib/firmware is immutable.
So either the blob should be written to a file in sysfs or it should be supplied by write or ioctl to a device node.
>
> /sys/devices/system/cpu/ifs/run_test
> Writing "1" to this file will trigger a scan on each core
> sequentially by logical CPU number (when HT is enabled this only
> runs the tests once for each core)
>
> /sys/devices/system/cpu/cpu#/ifs/run_test
> Writing "1" to one of these files will trigger a scan on just
> that core.
>
> Results of the tests are also provided in /sys:
>
> /sys/devices/system/cpu/ifs/status
> Global status. Will show the most serious status across
> all cores (fail > untested > pass)
>
> /sys/devices/system/cpu/ifs/cpu_fail_list
> /sys/devices/system/cpu/ifs/cpu_pass_list
> /sys/devices/system/cpu/ifs/cpu_untested_list
> CPU lists showing which CPUs have which test status
>
> /sys/devices/system/cpu/cpu#/ifs/status
> Status (pass/fail/untested) of each core
>
> /sys/devices/system/cpu/cpu#/ifs/details
> Hex value of the SCAN_STATUS MSR for the most recent test on
> this core. Note that the error_code field may contain driver
> defined software code not defined in the Intel SDM.
>
> Current driver limitations:
>
> 1) The ACTIVATE_SCAN MSR allows for running any consecutive subrange or
> available tests. But the driver always tries to run all tests and only
> uses the subrange feature to restart an interrupted test.
>
> 2) Hardware allows for some number of cores to be tested in parallel.
> The driver does not make use of this, it only tests one core at a time.
>
>
> Jithu Joseph (8):
> x86/microcode/intel: expose collect_cpu_info_early() for IFS
> platform/x86/intel/ifs: Add driver for In-Field Scan
> platform/x86/intel/ifs: Load IFS Image
> platform/x86/intel/ifs: Check IFS Image sanity
> platform/x86/intel/ifs: Authenticate and copy to secured memory
> platform/x86/intel/ifs: Create kthreads for online cpus for scan test
> platform/x86/intel/ifs: Add IFS sysfs interface
> platform/x86/intel/ifs: add ABI documentation for IFS
>
> Tony Luck (2):
> Documentation: In-Field Scan
> trace: platform/x86/intel/ifs: Add trace point to track Intel IFS
> operations
>
> Documentation/ABI/stable/sysfs-driver-ifs | 85 +++++
> Documentation/x86/ifs.rst | 108 ++++++
> Documentation/x86/index.rst | 1 +
> MAINTAINERS | 7 +
> arch/x86/include/asm/microcode_intel.h | 6 +
> arch/x86/kernel/cpu/microcode/intel.c | 8 +-
> drivers/platform/x86/intel/Kconfig | 1 +
> drivers/platform/x86/intel/Makefile | 1 +
> drivers/platform/x86/intel/ifs/Kconfig | 9 +
> drivers/platform/x86/intel/ifs/Makefile | 7 +
> drivers/platform/x86/intel/ifs/core.c | 387 +++++++++++++++++++++
> drivers/platform/x86/intel/ifs/ifs.h | 155 +++++++++
> drivers/platform/x86/intel/ifs/load.c | 299 ++++++++++++++++
> drivers/platform/x86/intel/ifs/sysfs.c | 394 ++++++++++++++++++++++
> include/trace/events/ifs.h | 38 +++
> 15 files changed, 1503 insertions(+), 3 deletions(-)
> create mode 100644 Documentation/ABI/stable/sysfs-driver-ifs
> create mode 100644 Documentation/x86/ifs.rst
> create mode 100644 drivers/platform/x86/intel/ifs/Kconfig
> create mode 100644 drivers/platform/x86/intel/ifs/Makefile
> create mode 100644 drivers/platform/x86/intel/ifs/core.c
> create mode 100644 drivers/platform/x86/intel/ifs/ifs.h
> create mode 100644 drivers/platform/x86/intel/ifs/load.c
> create mode 100644 drivers/platform/x86/intel/ifs/sysfs.c
> create mode 100644 include/trace/events/ifs.h
>
> --
> 2.17.1