Re: [RFC][PATCH 5/5] percpu-rwsem: Optimize readers and reduce global impact
From: Oleg Nesterov
Date: Fri May 29 2015 - 15:46:33 EST
Sorry for delay, finally I found the time to read this series...
The code matches our previous discussion and I believe it is correct.
Reviewed-by: Oleg Nesterov <oleg@xxxxxxxxxx>
Just one nit below,
On 05/26, Peter Zijlstra wrote:
>
> struct percpu_rw_semaphore {
> - unsigned int __percpu *fast_read_ctr;
> - atomic_t write_ctr;
> + unsigned int __percpu *refcount;
> + int state;
....
> +enum { readers_slow, readers_block };
Now that we rely on rss_sync() and thus we do not have "readers_fast",
I think that "bool reader_should_block" will look better.
> +void percpu_down_write(struct percpu_rw_semaphore *sem)
> {
...
so it does
rcu_sync_enter(&sem->rss);
state = BLOCK;
mb();
wait_event(sem->writer, readers_active_check(sem));
and this looks correct.
The only nontrivial thing we need to ensure is that
per_cpu_sum(*sem->refcount) == 0 can't be false positive. False
negative is fine.
And this means that if we see the result of this_cpu_dec() we must
not miss the result of the previous this_cpu_inc() on another CPU.
same or _another_ CPU.
And this is true because if the reader does dec() on another CPU
it does up_read() and this is only possible if down_read() didn't
see state == BLOCK.
But if it didn't see state == BLOCK then the writer must see the
result of the previous down_read()->inc().
IOW, we just rely on STORE-MB-LOAD, just the writer does LOAD
multiple times in per_cpu_sum():
DOWN_WRITE: DOWN_READ on CPU X:
state = BLOCK; refcount[X]++;
mb(); mb();
sum = 0; if (state != BLOCK)
sum += refcount[0]; return; /* success* /
sum += refcount[1];
... refcount[X]--;
sum += refcount[NR_CPUS];
If the reader wins and takes the lock, then its addition to
refcount[X] must be accounted by the writer.
The writer can obviously miss dec() from the reader, but we rely
on wake_up/wait_event in this case.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/