RE: About add an A64FX cache control function into resctrl

From: tan.shaopeng@xxxxxxxxxxx
Date: Wed Apr 28 2021 - 04:16:57 EST


Hi Reinette,

> On 4/21/2021 1:37 AM, tan.shaopeng@xxxxxxxxxxx wrote:
> > Hi,
> >
> > Ping... any comments&advice about add an A64FX cache control function
> into resctrl?
>
> My apologies for the delay.
>
> >
> > Best regards
> > Tan Shaopeng
> >
> >> Hello
> >>
> >>
> >> I'm Tan Shaopeng from Fujitsu Limited.
> >>
> >> I’m trying to implement Fujitsu A64FX’s cache related features.
> >> It is a cache partitioning function we called sector cache function
> >> that using the value of the tag that is upper 8 bits of the 64bit
> >> address and the value of the sector cache register to control virtual cache
> capacity of the L1D&L2 cache.
> >>
> >> A few days ago, when I sent a driver that realizes this function to
> >> ARM64 kernel community, Will Deacon and Arnd Bergmann suggested an
> >> idea to add the sector cache function of A64FX into resctrl.
> >>
> https://lore.kernel.org/linux-arm-kernel/CAK8P3a2pFcNTw9NpRtQfYr7A5Oc
> >> Z=As2kM0D_sbfFcGQ_J2Q+Q@xxxxxxxxxxxxxx/
> >>
> >> Based on my study, I think the sector cache function of A64FX can be
> >> added into the allocation features of resctrl after James' resctrl rework has
> finished.
> >> But, in order to implement this function, more interfaces for resctrl are
> need.
> >> The details are as follow, and could you give me some advice?
> >>
> >> [Sector cache function]
> >> The sector cache function split cache into multiple sectors and
> >> control them separately. It is implemented on the L1D cache and
> >> L2 cache in the A64FX processor and can be controlled individually
> >> for L1D cache and L2 cache. A64FX has no L3 cache. Each L1D cache and
> >> L2 cache has 4 sectors. Which L1D sector is used is specified by the
> >> value of [57:56] bits of address, how many ways of sector are
> >> specified by the value of register (IMP_SCCR_L1_EL0).
> >> Which L2 sector is used is specified by the value of [56] bits of
> >> address, and how many ways of sector are specified by value of
> >> register (IMP_SCCR_ASSIGN_EL1, IMP_SCCR_SET0_L2_EL1,
> >> IMP_SCCR_SET1_L2_EL1).
> >>
> >> For more details of sector cache function, see A64FX HPC extension
> >> specification (1.2. Sector cache) in https://github.com/fujitsu/A64FX
>
> The overview in section 12 was informative but very high level.

I'm considering how to answer your questions from your email which
I received before, when I check the email again, I am sorry that
the information I provided before are insufficient.

To understand the sector cache function of A64FX, could you please see
A64FX_Microarchitecture_Manual - section 12. Sector Cache
https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.4.pdf
and,
A64FX_Specification_HPC_Extension ? section 1.2. Sector Cache
https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Specification_HPC_Extension_v1_EN.pdf

In addition, Japan will be on a long holiday about one week from
April 29th, I will answer your other questions after the holidays.

> I was not able to find any instance of "IMP_SCCR" in this document to explore
> how this cache allocation works.
>
> Are these cache sectors exposed to the OS in any way? For example, when the
> OS discovers the cache, does it learn about these sectors and expose the
> details to user space (/sys/devices/system/cpuX/cache)?
>
> The overview of Sector Cache in that document provides details of how the size
> of the sector itself is dynamically adjusted to usage. That description is quite
> cryptic but it seems like a sector, since the number of ways associated with it
> can dynamically change, is more equivalent to a class of service or resource
> group in the resctrl environment.
>
> I really may be interpreting things wrong here, could you perhaps point me to
> where I can obtain more details?
>
>
> >> [Difference between resctrl(CAT) and this sector cache function]
> >> L2/L3 CAT (Cache Allocation Technology) enables the user to specify
> >> some physical partition of cache space that an application can fill.
> >> A64FX's L1D/L2 cache has 4 sectors and 16ways. This sector function
> >> enables a user to specify number of ways each sector uses.
> >> Therefore, for CAT it is enough to specify a cache portion for each
> >> cache_id (socket). On the other hand, sector cache needs to specify
> >> cache portion of each sector for each cache_id, and following
> >> extension to resctrl interface is needed to support sector cache.
> >>
> >> [Idear for A64FX sector cache function control interface (schemata
> >> file details)]
> >>
> L1:<cache_id0>=<cwbm>,<cwbm>,<cwbm>,<cwbm>;<cache_id1>=<cw
> >> bm>,<cwbm>,<cwbm>,<cwbm>;…
> >>
> L2:<cache_id0>=>=<cwbm>,<cwbm>,<cwbm>,<cwbm>;<cache_id1>=
> >> <cwbm>,<cwbm>,<cwbm>,<cwbm>;…
> >>
> >> ・L1: Add a new interface to control the L1D cache.
> >> ・<cwbm>,<cwbm>,<cwbm>,<cwbm>:Specify the number of ways for
> each
> >> sector.
> >> ・cwbm:Specify the number of ways in each sector as a bitmap
> (percentage),
> >> but the bitmap does not indicate the location of the cache.
> >> * In the sector cache function, L2 sector cache way setting register is
> >> shared among PEs (Processor Element) in shared domain. If two PEs
> >> which share L2 cache belongs to different resource groups, one
> resource
> >> group's L2 setting will affect to other resource group's L2 setting.
>
> In resctrl a "resource group" can be viewed as a class of service.
>
> >> * Since A64FX does not support MPAM, it is not necessary to consider
> >> how to switch between MPAM and sector cache function now.
> >>
> >> Some questions:
> >> 1.I'm still studying about RDT, could you tell me whether RDT has
> >> the similar mechanism with sector cache function?
>
> This is not clear to me yet. One thing to keep in mind is that a bit in the capacity
> bitmask could correspond to some number of ways in a cache, but it does not
> have to. It is essentially a hint to hardware on how much cache space needs to
> be allocated while also indicating overlap and isolation from other allocations.
>
> resctrl already supports the bitmask being interpreted differently between
> architectures and with the MPAM support there will be even more support for
> different interpretations.
>
> >> 2.In RDT, L3 cache is shared among cores in socket. If two cores which
> >> share L3 cache belongs to different resource groups, one resource
> >> group's L3 setting will affect to other resource group's L3 setting?
>
> This question is not entirely clear to me. Are you referring to the hardware layout
> or configuration changes via the resctrl "cpus" file?
>
> Each resource group is a class of service (CLOS) that is supported by all cache
> instances. By default each resource group would thus contain all cache
> instances on the system (even if some cache instances do not support the
> same number of CLOS resctrl would only support the CLOS supported by all
> resources).

Best regards
Tan Shaopeng