New PriorityInheritanceTest - bug in 2.6.17-rt7 confirmed
From: Esben Nielsen
Date: Thu Jul 06 2006 - 08:06:09 EST
Hi,
I finally got a glibc with PI mutexes compiled and started to work on my
old PriorityInheritanceTest tool. This tool previously used the blocker
device in the kernel to test the in-kernel PI mechanism, now it uses
pthread_mutex (the blocker calling code is still there in a #ifdef).
This is the idea:
Some SCHED_OTHER tasks goes into a critical section and busy loops
from 1ms. The critical section can be protected by one or more mutexes.
One SCHED_OTHER task takes mutex 0, another takes mutex 1 and then mutex 0,
a third mutex 2, mutex 1 and then mutex 0 etc. Once they have mutex 0
they spin for 1ms and releases the mutexes again. This is repeated with
no interruptions.
A RT task now tries to take all the mutexes. How long does it have to
wait? A long time ago we calculated that on a SMP machine the worst case
should be 2^{number of mutexes}-1 ms and UP it should just me {number of
mutexes} ms. On my UP machine it is as predicted.
Now I have added a timeout as well, i.e. the RT task calls
pthread_mutex_timedlock(). If I set the timeout to 0.5 ms, the worst case
should be 0.5 ms, right? It isn't on 2.6.17-rt7 on my UP machine. It is as
the timeout doesn't have any effect.
I have predicted this bug in an earlier mail
(http://marc.theaimsgroup.com/?l=linux-kernel&m=115192381727078&w=2).
It is basicly because in the event of a timeout the RT task doesn't get
the CPU after the boosted non-RT tasks are finished. And when the RT task
doesn't get the CPU it can't deboost the non-RT tasks...
Man page for pthread_mutex_timedlock:
"As a consequence of the priority inheritance rules (for mutexes initialized
with the PRIO_INHERIT protocol), if a timed mutex wait is terminated because
its timeout expires, the priority of the owner of the mutex shall be
adjusted as necessary to reflect the fact that this thread is no longer among
the threads waiting for the mutex."
So this is a real bug.
In the previous mail I posted a fix for that problem (and other problems).
I have attached the PriorityInheritanceTest program. To see the bug try
./test --timeout 500000 --samples 500 --tasks 2
on a UP machine.'
EsbenAttachment:
PriorityInheritanceTest.tgz
Description: GNU Unix tar archive