Re: [PATCH] Priorities in Anticipatory I/O scheduler

From: Naveen Gupta
Date: Tue Oct 28 2008 - 18:49:03 EST


2008/10/28 Dave Chinner <david@xxxxxxxxxxxxx>:
> On Tue, Oct 28, 2008 at 10:14:20AM -0700, Naveen Gupta wrote:
>> 2008/10/27 Dave Chinner <david@xxxxxxxxxxxxx>:
>> > On Mon, Oct 27, 2008 at 12:01:32PM -0700, ngupta@xxxxxxxxxx wrote:
>> >>
>> >> Modifications to the Anticipatory I/O scheduler to add multiple priority
>> >> levels. It makes use of anticipation and batching in current
>> >> anticipatory scheduler to implement priorities.
> .....
>> >> In this patch I have added a new class IOPRIO_CLASS_LATENCY to differentiate
>> >> notion of absolute priority over existing uses of various time-slice based
>> >> priority classes in cfq. Though internally within anticipatory scheduler all
>> >> of them map to best-effort levels. Hence, one can also use various best-effort
>> >> priority levels.
>> >
>> > Please don't introduce yet another incompatible behaviour between
>> > I/O schedulers. It's bad enough from an optimisation point of view
>> > that BIO_RW_SYNC and BIO_RW_META mean different things to different
>> > schedulers, let alone that only CFQ currently understands
>> > priorities. If you are going to introduce priorities into AS, then
>> > please, please, please make it use the same interface as CFQ.
>> >
>> > Why? Both the extN and XFS devs have been considering bumping the
>> > priority of journal writes using the existing CFQ-based I/O priority
>> > mechanism - the last thing I want to see is a different scheduler
>> > requiring a different priority configuration to acheive the same
>> > optimisation. There is no way we can support this sort of
>> > optimisation in the filesystem code if the interface changes when
>> > the I/O scheduler changes. So please use the existing IOPRIO classes
>> > to map the priorities for the AS scheduler.
>> >
>>
>> The anticipatory scheduler chooses it's next i/o to be of highest
>> available priority level.
>
> That sounds exactly like what the current RT class is supposed to
> be used for - defining the absolute priority of dispatch. How
> is this latency class different to the current RT class semantics
> that are defined for CFQ?
>

I/O from RT class in CFQ can still see a bubble with this new latency
class. An easy way to check this would be to submit ios at multiple
levels both in CFQ and AS and check max latency of the highest levels.
I will let Jens or Satoshi comment on exact algorithm for RT class.


>> So, in some sense it kind of implements
>> absolute priority and is best used for jobs which are latency
>> sensitive. Since the priorities can be and are mapped internally in
>> anticipatory scheduler, BEST_EFFORT class is mapped one-one with the
>> LATENCY class.
>
> So you map the BE class to something with the same semantics as the
> RT class? What mapping do you do when an application uses the RT
> class?
>

Yes I could have used RT class but it was used in CFQ to implement
it's time-sliced based highest priority class. If an application uses
RT class, AS maps all levels of RT class to BE class level 0 (i.e. to
the highest priority available)

>> A filesystem can use best-effort class using similar
>> interface as for cfq.
>
> The folk using the RT priority classes greatly objected to using
> the RT class for journal I/O precisely because it would then
> preempt their application's RT I/O and introduce unpredictable
> latencies.
>
> Journal I/O will typically use the highest priority BE class so that
> it is promoted above BE I/O but does not preempt RT I/O. With your
> mapping of BE classes to this new "absolute priority latency" class,
> this configuration will give journal I/O the highest priority in the
> scheduler. This will cause preemption of your latency sensitive I/O
> and so those latencies you are trying to avoid won't go away....
>

I see your problem, we could make the LATENCY class different from and
above BE class (instead of one-one mapping). Then you could use BE
class level 0 all you want. Or you could select BE level 1 and we can
keep the classes as they are.


> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/