I would really appreciate it, if you are going to come up with a new
locking primitive that you implement that locking primitive separately.
A fresh locking primitive comingled with other code is a good way to get
something wrong, and generally to get code that is not well maintained
because there is not a separation of concerns.
Furthermore there is a world of difference between a 1+jiffy delay
waiting for rcu_synchronize and the short hold times of task_lock.
Looking at your locking it appears to be a complete hack. Always taking
task_lock on read (but now you have an extra atomic op where you call
xchg on the pointer). Just calling compare_xchg on write if there
are no concurrent readers.
For raw performance you would do better to have a separate lock, or
potentially a specialized locking primitive that just used the low
pointer bits.
I don't know what motivates this work are you actually seeing
performance problems with setns?
I am very uncomofortable with a novel, and very awkward new locking
primitive that does not clearly show improvements in even it's target
workloads.
Hmm. After thinking about this a little more your set_reader_nsproxy is
completely unsafe. Most readers of nsproxy are from the same task.
Changing the low bits of the pointer of from another task will cause
those readers to segfault, and if not segfault they are reading from the
wrong memory locations.