Re: [PATCH] Percpu tag allocator

From: Andrew Morton
Date: Thu Jun 13 2013 - 15:23:46 EST


On Thu, 13 Jun 2013 12:15:07 -0700 Tejun Heo <tj@xxxxxxxxxx> wrote:

> Hello, Andrew.
>
> On Thu, Jun 13, 2013 at 12:04:39PM -0700, Andrew Morton wrote:
> > > The thing is that id[r|a] guarantee that the lowest available slot is
> > > allocated
> >
> > That isn't the case for ida_get_new_above() - the caller gets to
> > control the starting index.
>
> Hmmm? get_new_above() is the same, it must allocate the first
> available ID above the given low bound - used to exclude unused or
> reserved IDs.

Right. So using different starting IDs for different CPUs can be used
to improve scalability.

> > The worst outcome here is that idr.c remains unimproved and we merge a
> > new allocator which does basically the same thing.
>
> The lowest number guarantee makes them different. Maybe tag
> allocation can be layered on top as a caching layer, I don't know, but
> at any rate we need at least two different operation modes.

Why? Tag allocation doesn't care about the values - just that they be
unique.

> > The best outcome is that idr.c gets improved and we don't have to merge
> > duplicative code.
> >
> > So please, let's put aside the shiny new thing for now and work out how
> > we can use the existing tag allocator for these applications. If we
> > make a genuine effort to do this and decide that it's fundamentally
> > hopeless then this is the time to start looking at new implementations.
> >
> > (I can think of at least two ways of making ida_get_new_above() an
> > order of magnitude faster for this application and I'm sure you guys
> > can as well.)
>
> Oh, I'm sure the current id[r|a] can be improved upon a lot but I'm
> very skeptical one can reach the level of scalability necessary for,
> say, pci-e attached extremely high-iops devices while still keeping
> the lowest number allocation, which can't be achieved without strong
> synchronization on each alloc/free.
>
> Maybe we can layer things so that we have percpu layer on top of
> id[r|a] and, say, mapping id to point is still done by idr, or the
> percpu tag allocator uses ida for tag chunk allocations, but it's
> still gonna be something extra on top.

It's not obvious that explicit per-cpu is needed. Get an ID from
ida_get_new_above(), multiply it by 16 and store that in device-local
storage, along with a 16-bit bitmap. Blam, 30 lines of code and the
ida_get_new_above() cost is reduced 16x and it's off the map.

Or perhaps you can think of something smarter, but first you have to
start thinking of solutions rather than trying to find problems :(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/