Re: EEVDF and NUMA balancing

From: Peter Zijlstra
Date: Tue Oct 03 2023 - 17:52:09 EST


On Tue, Oct 03, 2023 at 10:25:08PM +0200, Julia Lawall wrote:
> Is it expected that the commit e8f331bcc270 should have an impact on the
> frequency of NUMA balancing?

Definitely not expected. The only effect of that commit was supposed to
be the runqueue order of tasks. I'll go stare at it in the morning --
definitely too late for critical thinking atm.

Thanks!

> The NAS benchmark ua.C.x (NPB3.4-OMP,
> https://github.com/mbdevpl/nas-parallel-benchmarks.git) on a 4-socket
> Intel Xeon 6130 suffers from some NUMA moves that leave some sockets with
> too few threads and other sockets with too many threads. Prior to the
> commit e8f331bcc270, this was corrected by subsequent load balancing,
> leading to run times of 20-40 seconds (around 20 seconds can be achieved
> if one just turns NUMA balancing off). After commit e8f331bcc270, the
> running time can go up to 150 seconds. In the worst case, I have seen a
> core remain idle for 75 seconds. It seems that the load balancer at the
> NUMA domain level is not able to do anything, because when a core on the
> overloaded socket has multiple threads, they are tasks that were NUMA
> balanced to the socket, and thus should not leave. So the "busiest" core
> chosen by find_busiest_queue doesn't actually contain any stealable
> threads. Maybe it could be worth stealing from a core that has only one
> task in this case, in hopes that the tasks that are tied to a socket will
> spread out better across it if more space is available?
>
> An example run is attached. The cores are renumbered according to the
> sockets, so there is an overload on socket 1 and an underload on sockets
> 2.
>
> julia