Re: [RFC PATCH] x86/cpufeatures: Enumerate new AVX512 bfloat16 instructions
From: Fenghua Yu
Date: Tue Jun 11 2019 - 18:42:18 EST
On Tue, Jun 11, 2019 at 09:47:02PM +0200, Borislav Petkov wrote:
> On Tue, Jun 11, 2019 at 11:19:20AM -0700, Fenghua Yu wrote:
> > So can I re-organize word 11 and 12 as follows?
> > 1. Change word 11 to host scattered features.
> > 2. Move the previos features in word 11 and word 12 to word 11:
> > /*
> > * Extended auxiliary flags: Linux defined - For features scattered in various
> > * CPUID levels and sub-leaves like CPUID level 7 and sub-leaf 1, etc, word 19.
> > */
> > #define X86_FEATURE_CQM_LLC (11*32+ 0) /* LLC QoS if 1 */
> > #define X86_FEATURE_CQM_OCCUP_LLC (11*32+ 1) /* LLC occupancy monitoring */
> > #define X86_FEATURE_CQM_MBM_TOTAL (11*32+ 2) /* LLC Total MBM monitoring */
> > #define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */
> > 3. Change word 12 to host CPUID.(EAX=7,ECX=1):EAX:
> > /* Intel-defined CPU features, CPUID level 0x7:1 (EAX), word 12 */
> > #define X86_FEATURE_AVX512_BF16 (12*32+ 0) /* BFLOAT16 instructions */
> This needs to be (12*32+ 5) if word 12 is going to map leaf
> At least judging from the arch extensions doc which lists EAX as:
> Bits 04-00: Reserved.
> Bit 05: AVX512_BF16. Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
> Bits 31-06: Reserved.
Yes, you are absolutely right. I'll defint it as (12*32+ 5).
> > 4. Do other necessary changes to match the new word 11 and word 12.
> But split in two patches: first does steps 1+2, second patch adds the
> new leaf to word 12.
There are two varialbes defined in cpuinfo_x86: x86_cache_max_rmid and
x86_cache_occ_scale. c->x86_cache_max_rmid is read from CPUID.0xf.1:ECX
and c->x86_cache_occ_scale is read from CPUID.0xf.1:EBX.
After getting X86_FEATURE_CQM_* from scattered, the two variables need
to be read from CPUID again. So the code of reading the two variables
need to be moved from before init_scattered_cpuid_features(c) to after
the function. This make the get_cpu_cap() code awkward.
And the two variables are ONLY used in resctrl monitoring configuration.
There is no need to store them in cpuinfo_x86 on each CPU.
I'm thinking to simplify and clean this part of code:
1. In patch #1:
- remove the definitions of x86_cache_max_rmid and x86_cache_occ_scale
- remove assignment of c->x86_cache_max_rmid and c->x86_cache_occ_scale
- get r->mon_scale and r->num_rmid in rdt_get_mon_l3_config(r) directly
from CPUID.0xf.1:EBX and CPUID.0xf.1:ECX.
2. In patch #2: do steps 1+2 to recycle word 11. After patch #1, I can
totally remove the code to get c->x86_cache_max_rmd and
c->x86_cache_occ_scale in get_cpu_cap(c). And patch #2 is cleaner.
3. In patch #3: add new word 12 to host CPUID.7.1:EAX
Do you think the patch #1 is necessary and this is a right patch set?