On Thu, Apr 09, 2015 at 08:13:27PM +0200, Peter Zijlstra wrote:
On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote:So one thing this does -- and one of the reasons I figured I should
+#define PV_HB_PER_LINE (SMP_CACHE_BYTES / sizeof(struct pv_hash_bucket))Since pv_lock_hash_bits is a variable, you end up running through that
+static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node)
+{
+ unsigned long init_hash, hash = hash_ptr(lock, pv_lock_hash_bits);
+ struct pv_hash_bucket *hb, *end;
+
+ if (!hash)
+ hash = 1;
+
+ init_hash = hash;
+ hb =&pv_lock_hash[hash_align(hash)];
+ for (;;) {
+ for (end = hb + PV_HB_PER_LINE; hb< end; hb++) {
+ if (!cmpxchg(&hb->lock, NULL, lock)) {
+ WRITE_ONCE(hb->node, node);
+ /*
+ * We haven't set the _Q_SLOW_VAL yet. So
+ * the order of writing doesn't matter.
+ */
+ smp_wmb(); /* matches rmb from pv_hash_find */
+ goto done;
+ }
+ }
+
+ hash = lfsr(hash, pv_lock_hash_bits, 0);
massive if() forest to find the corresponding tap every single time. It
cannot compile-time optimize it.
Hence:
hash = lfsr(hash, pv_taps);
(I don't get the bits argument to the lfsr).
In any case, like I said before, I think we should try a linear probe
sequence first, the lfsr was over engineering from my side.
+ hb =&pv_lock_hash[hash_align(hash)];
ditch the LFSR instead of fixing it -- is that you end up scanning each
bucket HB_PER_LINE times.
The 'fix' would be to LFSR on cachelines instead of HBs but then you're
stuck with the 0-th cacheline.