Re: [PATCHSET] block, mempool, percpu: implement percpu mempool andfix blkcg percpu alloc deadlock

From: Andrew Morton
Date: Thu Dec 22 2011 - 20:35:39 EST


On Thu, 22 Dec 2011 20:21:12 -0500 Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:

> On Thu, Dec 22, 2011 at 03:41:38PM -0800, Andrew Morton wrote:
> >
> > If the code *does* correctly handle ->stats_cpu == NULL then we have
> > options.
> >
> > a) Give userspace a new procfs/debugfs file to start stats gathering
> > on a particular cgroup/request_queue pair. Allocate the stats
> > memory in that.
> >
> > b) Or allocate stats_cpu on the first call to blkio_read_stat_cpu()
> > and return zeroes for this first call.
>
> But the purpose of stats is that they are gathered even if somebody
> has not read them even once?

That's not a useful way of using stats. The normal usage would be to
record the stats then start the workload then monitor how the stats
have changed as work proceeds.

> So if I create a cgroup and put some
> task into it which does some IO, I think stat collection should start
> immediately without user taking any action.

If you really want to know the stats since cgroup creation then trigger
the stats initialisation from userspace when creating the blkio_cgroup.

> Forcing the user to first
> read a stat before the collection starts is kind of odd to me.

Well one could add a separate stats_enable knob. Doing it
automatically from read() would be for approximate-back-compatibility
with existing behaviour.

Plus (again) this way we also avoid burdening non-stats-users with the
overhead of stats.

> >
> > c) Or change the low-level code to do
> > blkio_group.want_stats_cpu=true, then test that at the top level
> > after we've determined that blkio_group.stats_cpu is NULL.
> >
> > d) Or, worse, punt the allocation into a workqueue thread.
>
> I implemented a patch to punt the allocation using a worker thread. Tejun
> did not like it. I personally think that it is less intrusive to fix this
> specific problem.
>
> https://lkml.org/lkml/2011/12/19/291

hm.

Look, the basic problem here is a highish-level design one. We're
attempting to do a high-level heavyweight intialization operation in a
low-level place. How do we fix that *properly*? It has to be by
performing the heavyweight operation in an appropriate place.

> >
> > Note that all these option will permit us to use GFP_KERNEL, which is
> > better.
> >
> > Note that a) and b) means that users get control over whether these
> > stats are accumulated at all, so many won't incur needless memory and
> > CPU consumption.
> >
> > I think I like b). Fix the code so it doesn't oops when ->stats_cpu is
> > NULL, then turn on stats gathering the first time someone tries to read
> > the stats.
> >
> > (Someone appears to have misspelled "throttle" as "throtl" for no
> > apparent reason about 1000 times. Sigh.)
>
> That someone would be me. I thought that throtl communicates the meaning
> and keeps the length of all the strings relatively short. But if it does
> not look good, I can change it.

It's unconventional. We do usually avoid the odd abbreviations and
just spell the whole thing out.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/