Re: Futex hash_bucket lock can break isolation and cause priority inversion on RT

From: André Almeida
Date: Tue Oct 08 2024 - 11:38:53 EST


Hi Juri,

Em 08/10/2024 12:22, Juri Lelli escreveu:

[...]

Now, of course by making the latency sensitive application tasks use a
higher priority than anything on housekeeping CPUs we could avoid the
issue, but the fact that an implicit in-kernel link between otherwise
unrelated tasks might cause priority inversion is probably not ideal?
Thus this email.

Does this report make any sense? If it does, has this issue ever been
reported and possibly discussed? I guess it’s kind of a corner case, but
I wonder if anybody has suggestions already on how to possibly try to
tackle it from a kernel perspective.


That's right, unrelated apps can share the same futex bucket, causing those side effects. The bucket is determined by futex_hash() and then tasks get the hash bucket lock at futex_q_lock(), and none of those functions have awareness of priorities.

There's this work from Thomas that aims to solve corner cases like this, by giving apps the option to instead of using the global hash table, to have their own allocated wait queue: https://lore.kernel.org/lkml/20160402095108.894519835@xxxxxxxxxxxxx/

"Collisions on that hash can lead to performance degradation
and on real-time enabled kernels to unbound priority inversions."


Thanks!
Juri