test-by-test

From: Yin Li
Date: Tue Dec 02 2025 - 02:13:30 EST

Next message: David Howells: "Re: [PATCH v6 0/9] cifs: Miscellaneous prep patches for rewrite of I/O layer"
Previous message: Koichiro Den: "Re: [PATCH v3 6/7] PCI: endpoint: pci-epf-vntb: Switch vpci_scan_bus() to use pci_scan_root_bus()"
Next in thread: Yin Li: "Re: test-by-test"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Georgi,

on 16 May 2025 18:50:15 +0300, Georgi Djakov wrote:
>Hi Mike,
>...
>
>> To prevent this priority inversion, switch to using rt_mutex for
>> icc_bw_lock. This isn't needed for icc_lock since that's not used in the
>> critical, latency-sensitive voting paths.
>
>If the issue does not occur anymore with this patch, then this is a good
>sign, but we still need to get some numbers and put them in the commit
>message.

We constructed a priority inversion test scenario, which included multiple real-time threads with different priorities and CFS threads with different nice values competing for a mutex, to verify the overhead of the RT thread acquiring the lock mutex.

The maximum, minimum, and average of overhead were determined through 100 iterations of testing.

Then replace the mutex with an rt-mutex and perform the same test, obtaining the overhead's max, min, and average through 100 loops. Calculate the change in average.

Finally we can draw the conclusion:
1) In a scenario where the overhead of threads competing for a mutex is set to 5ms, using a mutex will result in an average overhead of 4127687ns for the tested rt threads to acquire the mutex.

2) After replacing the mutex with rt-mutex, the latency can be reduced to 2010555ns, which greatly improves the mutex overhead brought by priority inversion and reduces latency by about 50%.

3) Furthermore, to align with the user's given overhead of 40ms, the test case was modified to have a competing mutex thread overhead of
40ms, and the experiment was repeated, yielding similar results.

>The RT mutexes add some overhead and complexity that could
>increase latency for both uncontended and contended paths. I am curious
>if there is any regression for the non-priority scenarios. Also if >there
>are many threads, the mutex cost itself could become a bottleneck.

After testing, the overhead of a single rt-mutex is approximately 937ns, and the overhead of a single mutex is approximately 520ns. The overhead of a single rt-mutex does indeed lead to more latency.

However, in scenarios where multiple clients frequently access the interconnect API, the latency of using mutexes far outweighs the overhead added by rt-mutexes themselves.

Compared to the performance improvement of rt-mutex in a thread-contention environment, the latency itself is perfectly acceptable.

>This pulls in unconditionally all the RT-mutex stuff, which some people
>might not want (although today it's also selected by the I2C subsystem
>for example). I am wondering if we should make it configurable with the
>normal mutex being the default or just follow the i2c example... but
>maybe we can decide this when we have some numbers.

Making locks configurable is not a common practice. We do not intend to make changes in this patch.

on 7 Sep 2022 08:15:21 +0000, David Laigh wrote:
>From: Georgi Djakov
> ...
>I can't see why the RT kernel doesn't have exactly the same issues.
>Given how long a process switch takes I really suspect that most
>spinlocks should remain spinlocks.

It was proposed that serializing with a spinlock might be a simpler solution, but we cannot do that as holding the lock we call wait_for_completion_timeout in the RPM/RPMh which takes a mutex and could sleep in the atomic context.

--
Thx and BRs,
Yin

Next message: David Howells: "Re: [PATCH v6 0/9] cifs: Miscellaneous prep patches for rewrite of I/O layer"
Previous message: Koichiro Den: "Re: [PATCH v3 6/7] PCI: endpoint: pci-epf-vntb: Switch vpci_scan_bus() to use pci_scan_root_bus()"
Next in thread: Yin Li: "Re: test-by-test"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]