[RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors

From: Kohei Tarumizu
Date: Thu Nov 04 2021 - 01:29:21 EST


This patch series add hardware prefetch driver register/unregister
function. The purpose of this driver is to provide an interface to
control the hardware prefetch mechanism depending on the application
characteristics.

An earlier RFC[1], we were suggested that we create a hardware
prefetch directory under /sys/devices/system/cpu/[CPUNUM]/cache.
Hardware prefetch is a cache-related feature, but it does not require
cache sysfs feature. Therefore, we decided to isolate the code.
Specifically, create a directory under cpu/[CPUNUM].

[1]https://lore.kernel.org/lkml/OSBPR01MB2037D114B11153F00F233F8780389@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

Changes since v1:
- Add Intel hardware prefetch support
- Fix typo

This version adds Intel Hardware Prefetch support by Proposal A that
proposed in v1 RFC PATCH[2], and the proposal is also described in the
[RFC & Future plan] section of this letter.
This is the first step to supporting Intel processors, so we add
support only for INTEL_FAM6_BROADWELL_X.

[2]https://lore.kernel.org/lkml/20211011043952.995856-1-tarumizu.kohei@xxxxxxxxxxx/

Patch organizations are as follows:

- patch1: Add hardware prefetch core driver
This adds register/unregister function to create the sysfs interface
with attribute "enable", "dist", and "strong". Detailed description
of these are in Documentation/ABI/testing/sysfs-devices-system-cpu.

- patch2: Add support for A64FX
This adds module init/exit code for A64FX.

- patch3: Add support for Intel

- patch4: Add Kconfig/Makefile to build module

- patch5: Add documentation for the new sysfs interface

We tested this driver and measured its performance by STREAM benchmark
on our x86 machine. The results are as follows:

| Hardware Prefetch status | Triad |
|--------------------------|------------|
| Enabled | 40300.4600 |
| Disabled | 31694.6333 |

The performance is better with Enabled, which is an expected result.
We also measured the performance on our A64FX machine and showed the
results in v1 RFC PATCH.

[RFC & Future plan]
We plan to support Intel processors that have MSR 0x1A4(1A4H)[3].
We would appreciate it if you could give us a comment on how we should
handle multiple hardware prefetch types in enable attribute file for
Intel processor. Detailed description will be described later.

[3]https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
Volume 4

There are some cases where MSR 0x1A4 has different specifications
depending on the model. One of the specification of MSR 0x1A4 for each
bits is as follows:

[0] L2 Hardware Prefetcher Disable (R/W)
[1] L2 Adjacent Cache Line Prefetcher Disable (R/W)
[2] DCU Hardware Prefetcher Disable (R/W)
[3] DCU IP Prefetcher Disable (R/W)
[63:4] Reserved

If it supports enabling two types of hardware prefetches for each
cache, as in the specification above, we should consider how to
handle them.

We would like to assign these features to an enable attribute file
(i.e. Map l1/enable to bit[2:3] and l2/enable to bit[0:1]), and
consider the two proposals:

A) The enable file handles only one bit, and changes affect the multiple
hardware prefetch types at a certain cache level.

B) The enable file handles one or more bit, and changes to a single bit
affect a corresponding single hardware prefetch type.

For each proposal, an example of the result of writing to the enable
file when all bits of the MSR 0x1A4 are 0 is shown below.

| Value to write | bit[0] | bit[1] | bit[2] | bit[3] |
|-------------------------|--------|--------|--------|--------|
| A) write 1 to l1/enable | 0 | 0 | 1 | 1 |
| A) write 1 to l2/enable | 1 | 1 | 0 | 0 |
| B) write 1 to l1/enable | 0 | 0 | 1 | 0 |
| B) write 2 to l1/enable | 0 | 0 | 0 | 1 |
| B) write 3 to l2/enable | 1 | 1 | 0 | 0 |

Proposal A is simple, it uniformly controls the enablement of the
hardware prefetch type at a certain cache level. In this case, it is
easy to provide the same interface as the A64FX. However, it cannot
allow the detailed tuning(e.g. Write 1 to only bit[1]).

Proposal B allows the same tuning as direct register access. However,
user needs to know the hardware specifications (e.g. Number of features
that can be enabled via register) to use interface.

We think proposal A is better for providing a standard interface, but it
is a concern that it cannot provide all the features of the register.
Do you have any comments on these proposals?

Best regards,
Kohei Tarumizu

Kohei Tarumizu (5):
driver: hwpf: Add hardware prefetch core driver register/unregister
functions
driver: hwpf: Add support for A64FX to hardware prefetch driver
driver: hwpf: Add support for Intel to hardware prefetch driver
driver: hwpf: Add Kconfig/Makefile to build hardware prefetch driver
docs: ABI: Add sysfs documentation interface of hardware prefetch
driver

.../ABI/testing/sysfs-devices-system-cpu | 58 +++
MAINTAINERS | 7 +
arch/arm64/Kconfig.platforms | 6 +
arch/x86/Kconfig | 12 +
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/hwpf/Kconfig | 24 +
drivers/hwpf/Makefile | 9 +
drivers/hwpf/fujitsu_hwpf.c | 460 ++++++++++++++++++
drivers/hwpf/hwpf.c | 452 +++++++++++++++++
drivers/hwpf/intel_hwpf.c | 219 +++++++++
include/linux/hwpf.h | 38 ++
12 files changed, 1288 insertions(+)
create mode 100644 drivers/hwpf/Kconfig
create mode 100644 drivers/hwpf/Makefile
create mode 100644 drivers/hwpf/fujitsu_hwpf.c
create mode 100644 drivers/hwpf/hwpf.c
create mode 100644 drivers/hwpf/intel_hwpf.c
create mode 100644 include/linux/hwpf.h

--
2.27.0