Re: aio: questions with ioctx_alloc() and large num_possible_cpus()
From: Mauricio Faria de Oliveira
Date: Wed Oct 05 2016 - 13:22:01 EST
Hi Kent,
Thanks for commenting. I understood more of the code in trying to make
sense of your point, but there are some things still unclear about it;
if you could help a bit more, please.
Can you describe how a single thread might not be able to use all the
slots because 'up to about half of the reqs_available slots might
be on other percpu reqs_available' ?
I see that the thread might be scheduled on different CPUs (say, only
2 possible CPUs) and perform get_reqs_available() on both -- but that
only gives one req_batch to each CPU, and for req_batch to be half of
reqs_available its denominator needs to be 2, which doesn't happen w/
num_possible_cpus() * 4 -- which is 8. So I'm a bit confused here.
atomic_set(&ctx->reqs_available, ctx->nr_events - 1);
ctx->req_batch = (ctx->nr_events - 1) / (num_possible_cpus() * 4);
On 10/05/2016 03:34 AM, Kent Overstreet wrote:
- why "num_possible_cpus() * 4", and why "max(nr_events, <it>)" ?
For the scheme to work - percpu allocation of slots - we have to ensure that
there aren't too many unused slots stranded on other CPUs. The stranding is
limited to 1/4th of the slots [snip]
By 'unused slots' you mean the slots included in the batch allocated
to a particular cpu but not actually used by a thread in that cpu?
(e.g., get_reqs_available() called once, unused_slots == req_batch - 1)
Can you please detail a bit more how the limit to 1/4th of the slots is
ensured because of "num_possible_cpus() * 4", and what is the scenario
where the math is based on? I've been thinking and assuming values for
a while now, and didn't figure out the point where / how it occurs.
Thanks for your support,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center