Re: [PATCHSET] block, mempool, percpu: implement percpu mempool andfix blkcg percpu alloc deadlock

From: Vivek Goyal
Date: Mon Jan 16 2012 - 10:26:15 EST


On Tue, Dec 27, 2011 at 02:30:12PM -0800, Tejun Heo wrote:
> Hello, Andrew.
>
> On Tue, Dec 27, 2011 at 02:21:56PM -0800, Andrew Morton wrote:
> > <autorepeat>For those users who don't want the stats, stats shouldn't
> > consume any resources at all.
>
> Hmmm.... For common use cases - a few cgroups doing IOs to most likely
> single physical device and maybe a couple virtual ones, I don't think
> this would show up anywhere both in terms of memory and process
> overhead. While avoding it would be nice, I don't think that should
> be the focus of optimization or design decisions.
>
> > And I bet that the majority of the minority who want stats simply want
> > to know "how much IO is this cgroup doing", and don't need per-cgroup,
> > per-device accounting.
> >
> > And it could be that the minority of the minority who want per-device,
> > per-cgroup stats only want those for a minority of the time.
> >
> > IOW, what happens if we give 'em atomic_add() and be done with it?
>
> I really don't know. That surely is an enticing idea tho. Jens,
> Vivek, can you guys chime in? Is gutting out (or drastically
> simplifying) cgroup-dev stats an option? Are there users who are
> actually interested in this stuff?

Ok, I am back after a break of 3 weeks. So time to restart the discussion.

So we seem to be talking of two things.

- Use atomic_add() for stats.
- Do not keep stats per cgroup/per device instead just keep gloabl per
cgroup stat.

For the first point, is atomic operation really that cheap then taking
spin lock. The whole point of introducing per cpu data structure was
to make fast path lockless. My understanding is that atomic operation
on IO submission path is expensive so to me it really does not solve
the overhead problem?

Initially google folks (Divyesh Shah) introduced additional files to
display additional stats which per per cgroup per device. I am assuming
they are making use of it. To me knowing how IO is distributed to
different devies from a cgroup is a good thing to know.

Keeping the stats per device also helps that aggregation of stats happens
from process context and we reduce the contention on stat update from
various devices. So to me it is good thing to keep stats per device and
then display these as user find them useful (Either per cgroup or per
cgroup per device).

So to me none of the above options are really solving the issue of
reducing the cost/overhead of atomic operation in IO submission path.
Please correct me if missed something here.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/