I'd prefer the kernel to do such clustering...
I think that is a next step.
Also, while the kernel can do this at a best effort basis, it cannot
take into account things the kernel doesn't know about, like high
priority job peak load etc.., things a job scheduler would know.
Then again, a job scheduler would likely already know about the AVX
state anyway.