Why does glibc use AVX-512?
From: Andy Lutomirski
Date: Fri Mar 26 2021 - 00:39:33 EST
Hi all-
glibc appears to use AVX512F for memcpy by default. (Unless
Prefer_ERMS is default-on, but I genuinely can't tell if this is the
case. I did some searching.) The commit adding it refers to a 2016
email saying that it's 30% on KNL. Unfortunately, AVX-512 is now
available in normal hardware, and the overhead from switching between
normal and AVX-512 code appears to vary from bad to genuinely
horrible. And, once anything has used the high parts of YMM and/or
ZMM, those states tend to get stuck with XINUSE=1.
I'm wondering whether glibc should stop using AVX-512 by default.
Meanwhile, some of you may have noticed a little ABI break we have.
On AVX-512 hardware, the size of a signal frame is unreasonably large,
and this is causing problems even for existing software that doesn't
use AVX-512. Do any of you have any clever ideas for how to fix it?
We have some kernel patches around to try to fail more cleanly, but we
still fail.
I think we should seriously consider solutions in which, for new
tasks, XCR0 has new giant features (e.g. AMX) and possibly even
AVX-512 cleared, and programs need to explicitly request enablement.
This would allow programs to opt into not saving/restoring across
signals or to save/restore in buffers supplied when the feature is
enabled. This has all kinds of pros and cons, and I'm not sure it's a
great idea. But, in the absence of some change to the ABI, the
default outcome is that, on AMX-enabled kernels on AMX-enabled
hardware, the signal frame will be more than 8kB, and this will affect
*every* signal regardless of whether AMX is in use.
--Andy