Hi Reinette,
Sorry, I have not explained A64FX's sector cache function well yet.
I think I need explain this function from different perspective.
On 5/17/2021 1:31 AM, tan.shaopeng@xxxxxxxxxxx wrote:
--------
A64FX NUMA-PE-Cache Architecture:
NUMA0:
PE0:
L1sector0,L1sector1,L1sector2,L1sector3
PE1:
L1sector0,L1sector1,L1sector2,L1sector3
...
PE11:
L1sector0,L1sector1,L1sector2,L1sector3
L2sector0,1/L2sector2,3
NUMA1:
PE0:
L1sector0,L1sector1,L1sector2,L1sector3
...
PE11:
L1sector0,L1sector1,L1sector2,L1sector3
L2sector0,1/L2sector2,3
NUMA2:
...
NUMA3:
...
--------
In A64FX processor, one L1 sector cache capacity setting register is
only for one PE and not shared among PEs. L2 sector cache maximum
capacity setting registers are shared among PEs in same NUMA, and it is
to be noted that changing these registers in one PE influences other PE.
The number of ways for L2 Sector ID (0,1 or 2,3) can be set through
any PEs in same NUMA. The sector ID 0,1 and 2,3 are not available at
the same time in same NUMA.
I think, in your idea, a resource group will be created for each sector ID.
(> "sectors" could be considered the same as the resctrl "classes of service")
Then, an example of resource group is created as follows.
・ L1: NUMAX-PEY-L1sector0 (X = 0,1,2,3.Y = 0,1,2 ... 11),
・ L2: NUMAX-L2sector0 (X = 0,1,2,3)
In this example, sector with same ID(0) of all PEs is allocated to
resource group. The L1D caches are numbered from NUMA0_PE0-L1sector0(0)
to NUMA4_PE11-L1sector0(47) and the L2 caches numbered from
NUMA0-L2sector0(0) to NUM4-L2sector0(3).
(NUMA number X is from 0-4, PE number Y is from 0-11)
(1) The number of ways of NUMAX-PEY-L1sector0 can be set independently
for each PEs (0-47). When run a task on this resource group,
we cannot control on which PE the task is running on and how many
cache ways the task is using.
(2) Since L2 can only use 2 sectors at a time, when creating more than
2 resource groups, L2setctor0 will have to be allocated to a
different resource group. If the L2sector0 is shared by different
resource groups, the L2 sector settings on resource group will be
influenced by each other.
etc... there are various problems, and no merit to using resctrl.
In my idea, in order to allocate the L1 and L2 cache to a resource
group, allocate NUMA to the resource group.
An example of resource group is as follows.
・ NUMA0-PEY-L1sectorZ (Y = 0,1,2...11. Z = 0,1,2,3)
・ NUMA0-L2sectorZZ (ZZ = 0,1,2,3)
#cat /sys/fs/resctrl/p0/cpus
0-11 *1
#cat /sys/fs/resctrl/p0/schemata
L1:0=0xF,0x3,0x1,x0x0 *2
L2:0=0xFFF,0xF,0,0 *3
*1: PEs belong one NUMA. (Of course, multiple NUMAs can also be
specified in one resource group)
*2: The number of ways for L1sector0,1,2,3. On this resource group
the number of ways of all sector0 is the same(0xF). If 0 way is
specified for one sector, this sector cannot be used. If 4(0xF)
ways are specified for one sector, this sector can use cache fully.
If 4 ways are specified for each sector, there will be no
restriction for using cache.
*3: The number of ways for L2 sector 0,1. If L2sector0,1 is used,
the number of ways of L2sector2,3 must be set to 0.
All sectors with the same ID on the same resource group were set to
the same number of ways, and when running a task on A64FX, the sector
ID used by task is determined by [56:57] bits of virtual address.
By specifying the PID to /sys/fs/resctrl/tasks, the task will be bound
to the resource group, and then, the cache size used by task will not
be changed never.