Re: [PATCH percpu/for-3.17 1/2] percpu: implement percpu_pool
From: Andrew Morton
Date: Thu Jul 31 2014 - 21:14:38 EST
On Thu, 31 Jul 2014 20:44:38 -0400 Tejun Heo <tj@xxxxxxxxxx> wrote:
> Hello, Andrew.
> On Thu, Jul 31, 2014 at 6:03 PM, Andrew Morton
> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > I don't think we should add facilities such as this. Because if we do,
> > people will use them and thereby make the kernel less reliable, for
> > obvious reasons.
> > It would be better to leave the nasty hack localized within
> > blk-throttle.c and hope that someone finds a way of fixing it.
> The thing is we need similar facilities in the IO path in other places
> too. They share exactly the same characteristics - opportunistic
> percpu allocations during IO which are expected to fail from time to
> time and they will all implement fallback behavior on allocation
> failures. I'm not sure how this makes the kernel less reliable. This
> conceptually isn't different from atomic allocations which we also use
> in a similar way.
Atomic allocations are more robust than this thing. But yes, they also
are unreliable and their use should be discouraged for the same
> If you're worried that people might use this
> assuming that it won't fail,
That's precisely my concern.
Yet nowhere in either the changelog or the code comments is it even
mentioned that this allocator is unreliable and that callers *must*
implement (and test!) fallback paths.
> an obvious solution is adding a failure
> injection for debugging, but really except for being a bit ghetto,
> this is just the atomic allocation for percpu areas.
If it was a try-GFP_ATOMIC-then-fall-back-to-pool thing then it would
work fairly well. But it's not even that - a caller could trivially
chew through that pool in a single timeslice. Especially on !SMP.
Especially squared with !PREEMPT or SCHED_FIFO.
And that's all OK, as long as this is positioned as "opportunistic
performance optimisation which is expected to be available most of the
time in non-stressful use cases".
But please make very sure that this is how we position it. I don't
know how to do this. Maybe prefix the names with "blk_" to signify
that it is block-private (and won't even be there if !CONFIG_BLOCK).
Or rename percpu_pool_alloc() to percpu_pool_try_alloc() - that should
wake people up.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/