Re: x86/fpu: Inaccurate AVX-512 Usage Tracking via arch_status

From: chuang

Date: Thu Oct 30 2025 - 02:56:21 EST


On Mon, Oct 27, 2025 at 10:26 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 10/27/25 00:50, chuang wrote:
> > On AVX-512 capable systems, the implementation appears to record the
> > current timestamp into 'task->thread.fpu.avx512_timestamp' upon any
> > task switch, irrespective of whether the task has actually executed an
> > AVX-512 instruction.
>
> The timestamp update ultimately has _zero_ to do with executing
> AVX-512 instructions. It's all about the state in the ZMM registers, not
> AVX-512 instructions.

Got it, thanks.

> Those registers are inherited at fork and I don't see avx512_timestamp
> being zeroed anywhere. So I suspect what you are seeing is that some
> _parent_ used AVX512, and its children are getting stuck with
> avx512_timestamp.

I have tested with the attached patch. The behavior remains the same
as previously reported: after fpu_clone(), the new process still has a
non-zero avx512_timestamp, and it continues to be updated in
subsequent task switches, irrespective of AVX-512 instruction
execution.

I traced the code path within fpu_clone(): In fpu_clone() ->
save_fpregs_to_fpstate(), since my current Intel CPU supports XSAVE,
the call to os_xsave() results in the XFEATURE_Hi16_ZMM bit being
set/enabled in xsave.header.xfeatures. This then causes
update_avx_timestamp() to update fpu->avx512_timestamp. The same flow
occurs in __switch_to() -> switch_fpu_prepare().

Given this, is the issue related to my specific Intel Xeon Gold? Is
the CPU continuously indicating that the AVX-512 state is in use?

> You could probably confirm this by dumping ->avx512_timestamp in
> fpu_clone().
>
> Or, try the attached patch and see if it makes things work more like
> you'd expect.

Best regards,