Re: aio: questions with ioctx_alloc() and large num_possible_cpus()

From: Mauricio Faria de Oliveira
Date: Wed Oct 05 2016 - 13:22:01 EST


Hi Kent,

Thanks for commenting. I understood more of the code in trying to make
sense of your point, but there are some things still unclear about it;
if you could help a bit more, please.

Can you describe how a single thread might not be able to use all the
slots because 'up to about half of the reqs_available slots might
be on other percpu reqs_available' ?

I see that the thread might be scheduled on different CPUs (say, only
2 possible CPUs) and perform get_reqs_available() on both -- but that
only gives one req_batch to each CPU, and for req_batch to be half of
reqs_available its denominator needs to be 2, which doesn't happen w/
num_possible_cpus() * 4 -- which is 8. So I'm a bit confused here.

atomic_set(&ctx->reqs_available, ctx->nr_events - 1);
ctx->req_batch = (ctx->nr_events - 1) / (num_possible_cpus() * 4);

On 10/05/2016 03:34 AM, Kent Overstreet wrote:
- why "num_possible_cpus() * 4", and why "max(nr_events, <it>)" ?

For the scheme to work - percpu allocation of slots - we have to ensure that
there aren't too many unused slots stranded on other CPUs. The stranding is
limited to 1/4th of the slots [snip]

By 'unused slots' you mean the slots included in the batch allocated
to a particular cpu but not actually used by a thread in that cpu?
(e.g., get_reqs_available() called once, unused_slots == req_batch - 1)

Can you please detail a bit more how the limit to 1/4th of the slots is
ensured because of "num_possible_cpus() * 4", and what is the scenario
where the math is based on? I've been thinking and assuming values for
a while now, and didn't figure out the point where / how it occurs.

Thanks for your support,

--
Mauricio Faria de Oliveira
IBM Linux Technology Center