irq_build_affinity_masks() actually grouping CPUs evenly into each managed
irq vector according to NUMA and CPU locality, and it is reasonable to abstract
one generic API for grouping CPUs evenly, the idea is suggested by Thomas
Gleixner.
group_cpus_evenly() is abstracted and put into lib/, so blk-mq can re-use
it to build default queue mapping.
blk-mq IO perf data is observed as more stable, meantime with big
improvement, see detailed data in the last patch.
Please consider it for v6.3!