Re: ext3 allocate-with-reservation latencies

From: Ingo Molnar
Date: Mon Apr 04 2005 - 23:16:11 EST



* Lee Revell <rlrevell@xxxxxxxxxxx> wrote:

> I can trigger latencies up to ~1.1 ms with a CVS checkout. It looks
> like inside ext3_try_to_allocate_with_rsv, we spend a long time in this
> loop:
>
> ext3_test_allocatable (bitmap_search_next_usable_block)
> find_next_zero_bit (bitmap_search_next_usable_block)
> find_next_zero_bit (bitmap_search_next_usable_block)
>
> ext3_test_allocatable (bitmap_search_next_usable_block)
> find_next_zero_bit (bitmap_search_next_usable_block)
> find_next_zero_bit (bitmap_search_next_usable_block)

Breaking the lock is not really possible at that point, and it doesnt
look too easy to make that path preemptable either. (To make it
preemptable rsv_lock would need to become a semaphore (this could be
fine, as it's only used when a new reservation window is created).)

The hard part is the seqlock - the read side is performance-critical,
maybe it could be solved via a preemptable but still scalable seqlock
variant that uses a semaphore for the write side? It all depends on what
the scalability impact of using a semaphore for the new-window code
would be.

the best longterm solution for these types of tradeoffs seems to be to
add a locking primitive that is a spinlock on !PREEMPT kernels and a
semaphore on PREEMPT kernels. I.e. not as drastic as a full PREEMPT_RT
kernel, but good enough to make latency-critical codepaths of ext3
preemptable, without having to hurt scalability on !PREEMPT. The
PREEMPT_RT kernel has all the 'compile-time type-switching'
infrastructure for such tricks, all that needs to be changed to switch a
lock's type is to change the spinlock definition - all the
spin_lock(&lock) uses can remain unchanged. (The same method is used on
PREEMPT_RT to have 'dual-type' spinlocks.)

the same thing could then also be used for things like the mm lock, and
other longer-held locks that PREEMPT would like to see preemptable. It
would also be a good first step towards merging the PREEMPT_RT
infrastructure ;-) I'll cook up something.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/