Re: performance drop after using blkcg

From: Tejun Heo
Date: Tue Dec 11 2012 - 10:14:05 EST


Hello, Vivek.

On Tue, Dec 11, 2012 at 10:02:34AM -0500, Vivek Goyal wrote:
> cfq_group_served() {
> if (iops_mode(cfqd))
> charge = cfqq->slice_dispatch;
> cfqg->vdisktime += cfq_scale_slice(charge, cfqg);
> }
>
> Isn't it effectively IOPS scheduling. One should get IOPS rate in proportion to
> their weight (as long as they can throw enough traffic at device to keep
> it busy). If not, can you please give more details about your proposal.

The problem is that we lose a lot of isolation w/o idling between
queues or groups. This is because we switch between slices and while
a slice is in progress only ios belongint to that slice can be issued.
ie. higher priority cfqgs / cfqqs, after dispatching the ios they have
ready, lose their slice immmediately. Lower priority slice takes over
and when hgiher priority ones get ready, they have to wait for the
lower priority one before submitting the new IOs. In many cases, they
end up not being able to generate IOs any faster than the ones in
lower priority cfqqs/cfqgs.

This is becase we switch slices rather than iops. We can make cfq
essentially switch iops by implementing very aggressive preemption but
I really don't see much point in that. cfq is way too heavy and
ill-suited for high speed non-rot devices which are becoming more and
more consistent in terms of iops they can handle.

I think we need something better suited for the maturing non-rot
devices. They're becoming very different from what cfq was built for
and we really shouldn't be maintaining several rb trees which need
full synchronization for each IO. We're doing way too much and it
just isn't scalable.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/