Re: [PATCH v8 1/1] selftests/x86/xstate: Add xstate test cases for XSAVE feature

From: Dave Hansen
Date: Mon Apr 11 2022 - 10:42:54 EST


On 4/11/22 03:14, Pengfei Xu wrote:
> On 2022-04-08 at 09:58:50 -0700, Dave Hansen wrote:
>> On 3/16/22 05:40, Pengfei Xu wrote:
>>> +#ifndef __cpuid_count
>>> +#define __cpuid_count(level, count, a, b, c, d) ({ \
>>> + __asm__ __volatile__ ("cpuid\n\t" \
>>> + : "=a" (a), "=b" (b), "=c" (c), "=d" (d) \
>>> + : "0" (level), "2" (count)); \
>>> +})
>>> +#endif
>>
>>
>> By the time you post the next revision, I hope these are merged:
>>
>>> https://lore.kernel.org/all/cover.1647360971.git.reinette.chatre@xxxxxxxxx/
>>
>> Could you rebase on top of those to avoid duplication, please?
>>
> Yes, thanks for remind. I would like to use the helper __cpuid_count() based
> on above fixed patch.
> And I have a concern with stdlib.h with "-mno-sse" issue, it's a GCC bug,
> and it will be fixed in GCC11. And I could not use kselftest.h, because
> kselftest.h used stdlib.h also...

Ugh. What a mess.

> Thanks for the link! From above "id=27600"link, Lu Hongjiu mentioned that:
> It's a "GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99652";
> And id=99652 shows that it's fixed in GCC 11.
> I tried "-mgeneral-regs-only"(it includes "-mno-sse"), it also gcc failed
> with same error as "-mno-sse":
> "
> /usr/include/bits/stdlib-float.h: In function ‘atof’:
> /usr/include/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
> "
> And the error shows that: it's related to "stdlib-float.h", seems I didn't
> use stdlib-float.h related function.
>
> In order for this test code to support GCC versions before GCC11, such as
> GCC8, etc., I used this workaround "avoid stdlib.h" for GCC bug, and add
> *aligned_alloc() declaration above.
>
> Because kselftest.h uses stdlib.h also, so I could not use kselftest.h with
> "-mno-see" special GCC requirement, and seems I could not use __cpuid_count()
> patch in kselftest.h from Reinette Chatre.
> Thanks!

Can you break this test up into two pieces which are spread across two
.c files? One that *just* does the sequences that need XSAVE and to
preserve FPU state with -mno-sse, and then a separate one that uses
stdlib.h and also the kselftest infrastructure?

For instance, all of the detection and enumeration can be in the nornal
.c file.

...
>>> +/* Fill FP/XMM/YMM/OPMASK and PKRU xstates into buffer. */
>>> +static void fill_xstates_buf(struct xsave_buffer *buf, uint32_t xsave_mask)
>>> +{
>>> + uint32_t i;
>>> + /* The data of FP x87 state are as follows. */
>>
>> OK, but what *is* this? Random data? Or did you put some data in that
>> means something?
>>
> I learned from filling the fp data by follow functions: fill FPU xstate
> by fldl instructions, push the source operand onto the FPU register stack
> and recorded the fp xstate, then used buffer filling way
> to fill the fpu xstates:
> Some fp data could be set as random value under the "FP in SDM rules".
> Shall I add the comment for the fpu data filling like as follow:
> "
> /*
> * The data of FP x87 state has the same effect as pushing source operand
> * 0x1f2f3f4f1f2f3f4f onto the FPU stack ST0-7.
> */
> unsigned char fp_data[160] = {...
> "

No, that's hideous. If you generated the fp state with code, let's
include the *code*.

> As follow is the pushing source operand 0x1f2f3f4f1f2f3f4f onto the FPU stack
> ST0-7:
> "
> static inline void prepare_fp_buf(uint64_t ui64_fp)
> {
> /* Populate ui64_fp value onto FP registers stack ST0-7. */
> asm volatile("finit");
> asm volatile("fldl %0" : : "m" (ui64_fp));
> asm volatile("fldl %0" : : "m" (ui64_fp));
> asm volatile("fldl %0" : : "m" (ui64_fp));
> asm volatile("fldl %0" : : "m" (ui64_fp));
> asm volatile("fldl %0" : : "m" (ui64_fp));
> asm volatile("fldl %0" : : "m" (ui64_fp));
> asm volatile("fldl %0" : : "m" (ui64_fp));
> asm volatile("fldl %0" : : "m" (ui64_fp));
> }
> ...
> prepare_fp_buf(0x1f2f3f4f);
> __xsave(buf, xstate_info.mask);
> "
>
> The code to set fp data and display xstate is as follow:
> https://github.com/xupengfe/xstate_show/blob/0411_fp/x86/xstate.c
>
> I found there were 2 different:
>>> + unsigned char fp_data[160] = {
>>> + 0x7f, 0x03, 0x00, 0x00, 0xff, 0x00, 0x00, 0x00,
>>> + 0xf0, 0x19, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00,
> ^Above function shows "0xff, 0x12" not "0xf0, 0x19" in 0x8/0x9
> offset bytes:
> Bytes 11:8 are used for bits 31:0 of the x87 FPU Instruction Pointer
> Offset(FIP). It could be impacted if I added "vzeroall" and so on instructions,
> in order to match with above fpu function result, will change to "0xff, 0x12".
>
>>> + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>>> + 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00,
> ^Above function shows "0x80, 0x1f" not "0x00, 0x00" in offset
> 0x18/0x19 bytes:
> Bytes 27:24 are used for the MXCSR register. XRSTOR and XRSTORS generate
> general-protection faults (#GP) in response to attempts to set any of the
> reserved bits of the MXCSR register.
> It could be set as "0x00, 0x00", in order to match with result of above
> function, will fill as "0x80, 0x1f".

I'm totally lost with what you are trying to say here.

...
>> I have to wonder if we can do this in a bit more structured way:
>>
>> struct xstate_test
>> {
>> bool (*init)(void);
>> bool (*fill_state)(struct xsave_buffer *buf);
>> ...
>> }
>>
>> Yes, that means indirect function calls, but that's OK since we know it
>> will all be compiled with the "no-sse" flag (and friends).
>>
>> Then fill_xstates_buf() just becomes a loop over an xstate_tests[] array.
>>
> Yes, it's much better to fill the buf in a loop! Thanks!
> Because it's special for pkru filling, so I will improve it like as follow:
> "
> uint32_t xfeature_num;
> ...
>
> /* Fill test byte value into each tested xstate buffer(offset/size). */
> for (xfeature_num = XFEATURE_YMM; xfeature_num < XFEATURE_MAX;
> xfeature_num++) {
> if (xstate_tested(xfeature_num)) {
> if (xfeature_num == XFEATURE_PKRU) {
> /* Only 0-3 bytes of pkru xstates are allowed to be written. */
> memset((unsigned char *)buf + xstate_info.offset[XFEATURE_PKRU],
> PKRU_TESTBYTE, sizeof(uint32_t));
> continue;
> }
>
> memset((unsigned char *)buf + xstate_info.offset[xfeature_num],
> XSTATE_TESTBYTE, xstate_info.size[xfeature_num]);
> }
> }
> "

I'm not sure that's an improvement.

>>> +/*
>>> + * Because xstate like XMM, YMM registers are not preserved across function
>>> + * calls, so use inline function with assembly code only to raise signal.
>>> + */
>>> +static inline long long __raise(long long pid_num, long long sig_num)
>>> +{
>>> + long long ret, nr = SYS_kill;
>>> +
>>> + register long long arg1 asm("rdi") = pid_num;
>>> + register long long arg2 asm("rsi") = sig_num;
>>
>> I really don't like register variables. They're very fragile. Imagine
>> if someone put a printf() right here. Why don't you just do this with
>> inline assembly directives?
>>
> It seems that adding printf() to an inline function is not good.
> I'm not sure if I understand correctly: should I use inline assembly to
> assign variables to rdi, rsi and then syscall?
> It it's right, I will think about how to implement it by inline assembly way
> and fix it.

Yes, do it with inline assembly.