On 2/24/21 8:52 AM, chris hyser wrote:
On 2/24/21 8:02 AM, Chris Hyser wrote:
However, it means that overall throughput of your binary is cut in
~half, since none of the threads can share a core. Note that I never
saw an indefinite deadlock, just ~2x runtime for your binary vs th > control. I've verified that both a) manually hardcoding all threads to
be able to share regardless of cookie, and b) using a machine with 6
cores instead of 2, both allow your binary to complete in the same
amount of time as without the new API.
This was on a 24 core box. When I run the test, I definitely hangs. I'll answer your other email as wwll.
I just want to clarify. The test completes in secs normally. When I run this on the 24 core box from the console, other ssh connections immediately freeze. The console is frozen. You can't ping the box and it has stayed that way for up to 1/2 hour before I reset it. I'm trying to get some kind of stack trace to see what it is doing. To the extent that I've been able to trace it or print it, the "next code" always seems to be __sched_core_update_cookie(p);
I cannot duplicate this on a 4 core box even with 1000's of processes and threads. The 24 core box does not even create the full 400 processes/threads in that test before it hangs and that test reliably fails almost instantly. The working theory is that the 24 core box is doing way more of the clone syscalls in parallel.