RE: [RFC] Adding A64FX hardware prefetch sysfs interface

From: tarumizu.kohei@xxxxxxxxxxx
Date: Thu Jun 17 2021 - 21:33:10 EST


Hi, James.

Thank you for your comment.

> While this is initially about sysfs, don't you need the 'HPC tag address override'
> to be enabled for this to be useful? I don't think that feature can be managed by
> a driver:

It is certainly useful to enable 'HPC tag address override' for more control.
However, enabling "HPC tag address override" has some challenges as you commented.
We have also verified that the performance can be improved via IMP_PF_STREAM_DETECT_CTRL_EL0 without using 'HPC tag address override'.
Therefore, first, we would like to implement sysfs interface to control only IMP_PF_STREAM_DETECT_CTRL_EL0.

At this time, we don't intend to enable "HPC tag address override", but if necessary, we would like to consider it.

> 'HPC tag address override' changes the top byte of all user-space pointers from
> being ignored (as they have been since day-1 on arm64) to having implications
> for the hardware.
> If I've read the document correctly this affects the prefetch mode and where in
> the L1/L2 such accesses will be allocated.

Your understanding of 'HPC tag address override' is correct.
If it's enabled, tuning according to characteristics of each load/store instruction is possible.
On the other hand, we can still change system-wide settings 'Prefetch Enablement (bit: [59], [58])', 'Prefetch Distance (bit: [27:24], [19:16])', and 'Prefetch Reliableness (bit: [55], [54])' via IMP_PF_STREAM_DETECT_CTRL_EL0 without it.
The latter does not allow to per-instruction tuning, but allow per-application tuning.
At this point, we assume that one application is bound to one core.

> This would impact user-space that is using the top-byte for their own purposes.
> For example hwasan uses this field as a tag it allocates itself:
> https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
> Enabling 'HPC tag address override' for all user-space is going to have weird
> performance effects.
>
> To make this work, I think you'd need a per-process opt-in, and __switch_to()
> would need to toggle your IMP_FJ_TAG_ADDRESS_CTRL_EL1.TBOx bits.
> Because its an implementation-defined feature, but the controls can't be
> confined to a driver, I don't think enabling 'HPC tag address override' is viable.

We understood that there are these challenges if we try to enable 'HPC tag address override'.
However, if we don't enable 'HPC tag address override', these considerations are probably unnecessary because settings via IMP_PF_STREAM_DETECT_CTRL_EL0 are treated as system-wide settings.

> Is the sysfs information useful without it?

We think it's enough to tune system-wide settings 'Prefetch Enablement', 'Prefetch Distance', and 'Prefetch Reliableness' via IMP_PF_STREAM_DETECT_CTRL_EL0 in most case.
Therefore, we think it is useful to implement sysfs interface to operate only IMP_PF_STREAM_DETECT_CTRL_EL0.

Best regards.