Re: [RFC patch 0/5] futex: Allow lockless empty check of hashbucketplist in futex_wake()

From: Thomas Gleixner
Date: Mon Dec 02 2013 - 06:01:39 EST


On Sat, 30 Nov 2013, Davidlohr Bueso wrote:
> On Thu, 2013-11-28 at 12:59 +0100, Peter Zijlstra wrote:
> > On Wed, Nov 27, 2013 at 11:44:38PM -0800, Davidlohr Bueso wrote:
> > > How about both enlarging the table _and_ aligning the buckets? As you
> > > know, increasing the size of the table also benefits (particularly in
> > > larger systems) in having more spinlocks. So we reduce the amount of
> > > collisions and alleviate contention on the hb->lock. Btw, do you have
> > > any particular concerns about the larger hash table patch?
> >
> > My only concern was the amount of #ifdef.
> >
> > Wouldn't something like the below also work?
>
> Below are the results for a workload that stresses the uaddr hashing for
> large amounts of futexes (just make waits fail the uval check, so no
> list handing overhead) on an 80 core, 1Tb NUMA system.
>
> +---------+--------------------+------------------------+-----------------------+-------------------------------+
> | threads | baseline (ops/sec) | aligned-only (ops/sec) | large table (ops/sec) | large table+aligned (ops/sec) |
> +---------+--------------------+------------------------+-----------------------+-------------------------------+
> | 512 | 32426 | 50531 (+55.8%) | 255274 (+687.2%) | 292553 (+802.2%) |
> | 256 | 65360 | 99588 (+52.3%) | 443563 (+578.6%) | 508088 (+677.3%) |
> | 128 | 125635 | 200075 (+59.2%) | 742613 (+491.1%) | 835452 (+564.9%) |
> | 80 | 193559 | 323425 (+67.1%) | 1028147 (+431.1%) | 1130304 (+483.9%) |
> | 64 | 247667 | 443740 (+79.1%) | 997300 (+302.6%) | 1145494 (+362.5%) |
> | 32 | 628412 | 721401 (+14.7%) | 965996 (+53.7%) | 1122115 (+78.5%) |
> +---------+--------------------+------------------------+-----------------------+-------------------------------+
>
> Baseline of course sucks compared to any other performance boost, and we
> get the best throughput when applying both optimizations, no surprise.
> We do particularly well for more than 32 threads, and the 'aligned-only'
> column nicely exemplifies the benefits of SMP aligning the buckets
> without considering the reduction in collisions.

Right. So yes, we want both. And I fully agree with Peters dynamic
allocation approach.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/