On 08/01/2013 08:07 AM, Waiman Long wrote:
+}
+/**
+ * queue_spin_trylock - try to acquire the queue spinlock
+ * @lock : Pointer to queue spinlock structure
+ * Return: 1 if lock acquired, 0 if failed
+ */
+static __always_inline int queue_spin_trylock(struct qspinlock *lock)
+{
+ if (!queue_spin_is_contended(lock) && (xchg(&lock->locked, 1) == 0))
+ return 1;
+ return 0;
+}
+
+/**
+ * queue_spin_lock - acquire a queue spinlock
+ * @lock: Pointer to queue spinlock structure
+ */
+static __always_inline void queue_spin_lock(struct qspinlock *lock)
+{
+ if (likely(queue_spin_trylock(lock)))
+ return;
+ queue_spin_lock_slowpath(lock);
+}
quickly falling into slowpath may hurt performance in some cases. no?
Instead, I tried something like this:
#define SPIN_THRESHOLD 64
static __always_inline void queue_spin_lock(struct qspinlock *lock)
{
unsigned count = SPIN_THRESHOLD;
do {
if (likely(queue_spin_trylock(lock)))
return;
cpu_relax();
} while (count--);
queue_spin_lock_slowpath(lock);
}
Though I could see some gains in overcommit, but it hurted undercommit
in some workloads :(.
+/**
+ * queue_trylock - try to acquire the lock bit ignoring the qcode in lock
+ * @lock: Pointer to queue spinlock structure
+ * Return: 1 if lock acquired, 0 if failed
+ */
+static __always_inline int queue_trylock(struct qspinlock *lock)
+{
+ if (!ACCESS_ONCE(lock->locked) && (xchg(&lock->locked, 1) == 0))
+ return 1;
+ return 0;
+}
It took long time for me to confirm myself that,
this is being used when we exhaust all the nodes. But not sure of
any better name so that it does not confuse with queue_spin_trylock.
anyway, they are in different files :).
Result:
sandybridge 32 cpu/ 16 core (HT on) 2 node machine with 16 vcpu kvm
guests.
In general, I am seeing undercommit loads are getting benefited by the patches.
base = 3.11-rc1
patched = base + qlock
+----+-----------+-----------+-----------+------------+-----------+
hackbench (time in sec lower is better)
+----+-----------+-----------+-----------+------------+-----------+
oc base stdev patched stdev %improvement
+----+-----------+-----------+-----------+------------+-----------+
0.5x 18.9326 1.6072 20.0686 2.9968 -6.00023
1.0x 34.0585 5.5120 33.2230 1.6119 2.45313
+----+-----------+-----------+-----------+------------+-----------+
+----+-----------+-----------+-----------+------------+-----------+
ebizzy (records/sec higher is better)
+----+-----------+-----------+-----------+------------+-----------+
oc base stdev patched stdev %improvement
+----+-----------+-----------+-----------+------------+-----------+
0.5x 20499.3750 466.7756 22257.8750 884.8308 8.57831
1.0x 15903.5000 271.7126 17993.5000 682.5095 13.14176
1.5x 1883.2222 166.3714 1742.8889 135.2271 -7.45177
2.5x 829.1250 44.3957 803.6250 78.8034 -3.07553
+----+-----------+-----------+-----------+------------+-----------+
+----+-----------+-----------+-----------+------------+-----------+
dbench (Throughput in MB/sec higher is better)
+----+-----------+-----------+-----------+------------+-----------+
oc base stdev patched stdev %improvement
+----+-----------+-----------+-----------+------------+-----------+
0.5x 11623.5000 34.2764 11667.0250 47.1122 0.37446
1.0x 6945.3675 79.0642 6798.4950 161.9431 -2.11468
1.5x 3950.4367 27.3828 3910.3122 45.4275 -1.01570
2.0x 2588.2063 35.2058 2520.3412 51.7138 -2.62209
+----+-----------+-----------+-----------+------------+-----------+
I saw dbench results improving to 0.3529, -2.9459, 3.2423, 4.8027
respectively after delaying entering to slowpath above.
[...]
I have not yet tested on bigger machine. I hope that bigger machine will
see significant undercommit improvements.