Re: [RFC 0/3] block: proportional based blk-throttling

From: Shaohua Li
Date: Fri Jan 22 2016 - 13:04:01 EST

Next message: Joe Perches: "Re: wireless-drivers: random cleanup patches piling up"
Previous message: Dave Hansen: "[PATCH] mm, gup: introduce concept of "foreign" get_user_pages()"
In reply to: Vivek Goyal: "Re: [RFC 0/3] block: proportional based blk-throttling"
Next in thread: Vivek Goyal: "Re: [RFC 0/3] block: proportional based blk-throttling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Jan 22, 2016 at 10:52:36AM -0500, Vivek Goyal wrote:
> On Fri, Jan 22, 2016 at 09:48:22AM -0500, Tejun Heo wrote:
> > Hello, Shaohua.
> >
> > On Thu, Jan 21, 2016 at 04:00:16PM -0800, Shaohua Li wrote:
> > > > The thing is that most of the possible contentions can be removed by
> > > > implementing per-cpu cache which shouldn't be too difficult. 10%
> > > > extra cost on current gen hardware is already pretty high.
> > >
> > > I did think about this. per-cpu cache does sound straightforward, but it
> > > could severely impact fairness. For example, we give each cpu a budget,
> > > see 1MB. If a cgroup doesn't use the 1M budget, we don't hold the lock.
> > > But if we have 128 CPUs, the cgroup can use 128 * 1M more budget, which
> > > breaks fairness very much. I have no idea how this can be fixed.
> >
> > Let's say per-cgroup buffer budget B is calculated as, say, 100ms
> > worth of IO cost (or bandwidth or iops) available to the cgroup. In
> > practice, this may have to be adjusted down depending on the number of
> > cgroups performing active IOs. For a given cgroup, B can be
> > distributed among the CPUs that are actively issuing IOs in that
> > cgroup. It will degenerate to round robin of small budget if there
> > are too many active for the budget available but for most cases this
> > will cut down most of cross-CPU traffic.
> >
> > > > They're way more predictable than rotational devices when measured
> > > > over a period. I don't think we'll be able to measure anything
> > > > meaningful at individual command level but aggregate numbers should be
> > > > fairly stable. A simple approximation of IO cost such as fixed cost
> > > > per IO + cost proportional to IO size would do a far better job than
> > > > just depending on bandwidth or iops and that requires approximating
> > > > two variables over time. I'm not sure how easy / feasible that
> > > > actually would be tho.
> > >
> > > It still sounds like IO time, otherwise I can't imagine we can measure
> > > the cost. If we use some sort of aggregate number, it likes a variation
> > > of bandwidth. eg cost = bandwidth/ios.
> >
> > I think cost of an IO can be approxmiated by a fixed per-IO cost +
> > cost proportional to the size, so
> >
> > cost = F + R * size
> >
>
> Hi Tejun,
>
> May be we can throw in a cost differentiation for IO direction also here.
> This still will not take care of cost based on IO pattern, but that's
> another level of complexity which can be added to keep track of IO pattern
> of cgroup and bump up cost accordingly.
>
> Here are some random thoughts basically adding some more details to your idea.
> I am not sure whether it makes sense or not or how difficult it is to
> implement it.
>
> Assume we ensure fairness in a time interval of T and have total of N
> tokens for IO in that time interval T. When a new inteval starts, we
> distribute these N tokens to the pending cgroups based on their weight and
> proportional share. And keep on distributing N tokens after each time
> interval.
>
> We will have to come up with some sort of cost matrix to determine how many
> tokens should be charged per IO (cost per IO). And how to adjust that cost
> dynamically.
>
> Both N and T will be variable and will have to be adjusted continuously.
> For N we could start with some initial number. If we distributed too many
> tokens then device can handle in time T, then in next cycle we will have
> to reduce the value of N and distribute less tokens. If we distributed
> too less tokens and device is fast and finished in less time than T,
> then we can start next cycle sooner and distribute more tokens for next
> cycle. So based on device throughput in a certain time interval, number
> of tokens issued for next cycle will vary.

Note, we don't know if we dispatch too many/too less tokens. A device
with large queue depth can accept all requests. If queue depth is 1,
things would be easy.

> Initially I guess cost could be fixed also. That is say, 5 tokens for each
> IO plus 1 token for each 4KB of IO size. If we underestimate the cost of
> IO, then N tokens will not be consumed in time T and next time we will
> distribute less tokens. If we overestimate the cost of IO, then N tokens
> will finish fast and next time we will give more. So exact cost of IO
> might not be a huge factor.

we still need know the R. any idea for this?

Thanks,
SHaohua

Next message: Joe Perches: "Re: wireless-drivers: random cleanup patches piling up"
Previous message: Dave Hansen: "[PATCH] mm, gup: introduce concept of "foreign" get_user_pages()"
In reply to: Vivek Goyal: "Re: [RFC 0/3] block: proportional based blk-throttling"
Next in thread: Vivek Goyal: "Re: [RFC 0/3] block: proportional based blk-throttling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]