On 15. 09. 23, 17:19, Peter Zijlstra wrote:
On Fri, Sep 15, 2023 at 02:58:35PM +0200, Thomas Gleixner wrote:
I spent quite some time to convince myself that this is correct. I was
not able to poke a hole into it. So that really should be safe to
do. Famous last words ...
IKR :-/
Something like so then...
---
Subject: futex/pi: Fix recursive rt_mutex waiter state
So this breaks some random test in APR:
From https://build.opensuse.org/package/live_build_log/openSUSE:Factory:Staging:G/apr/standard/x86_64:
testprocmutex : Line 122: child did not terminate with success
The child in fact terminates on https://github.com/apache/apr/blob/trunk/test/testprocmutex.c#L93:
while ((rv = apr_proc_mutex_timedlock(proc_lock, 1))) {
if (!APR_STATUS_IS_TIMEUP(rv))
exit(1); <----- here
The test creates 6 children and does some pthread_mutex_timedlock/unlock() repeatedly (200 times) in parallel while sleeping 1 us inside the lock. The timeout is 1 us above. And the test expects all them to fail (to time out). But the time out does not always happen in 6.7 (it's racy, so the failure is semi-random: like 1 of 1000 attempts is bad).
If I revert this patch (commit fbeb558b0dd0d), the test works.
I know, the test could be broken too, but I have no idea, really. The testsuite is sort of hairy and I could not come up with a simple repro.
Note APR sets up PTHREAD_PROCESS_SHARED, _ROBUST, and _PRIO_INHERIT attrs for the mutex.
Anyway:
downstream report: https://bugzilla.suse.com/show_bug.cgi?id=1218801
APR report: https://bz.apache.org/bugzilla/show_bug.cgi?id=68481
Any idea if this patch should cause the above (or even is a desired behavior)?
Thanks.