Re: [PATCH v7 1/4] spinlock: A new lockref structure for locklessupdate of refcount

From: Waiman Long
Date: Fri Aug 30 2013 - 15:21:12 EST


On 08/30/2013 02:53 PM, Linus Torvalds wrote:
So the perf data would be *much* more interesting for a more varied load. I know pretty much exactly what happens with my silly test-program, and as you can see it never really gets to the actual spinlock, because that test program will only ever hit the fast-path case. It would be much more interesting to see another load that may trigger the d_lock actually being taken. So:
For the other test cases that I am interested in, like the AIM7 benchmark,
your patch may not be as good as my original one. I got 1-3M JPM (varied
quite a lot in different runs) in the short workloads on a 80-core system.
My original one got 6M JPM. However, the test was done on 3.10 based kernel.
So I need to do more test to see if that has an effect on the JPM results.
I'd really like to see a perf profile of that, particularly with some
call chain data for the relevant functions (ie "what it is that causes
us to get to spinlocks"). Because it may well be that you're hitting
some of the cases that I didn't see, and thus didn't notice.

In particular, I suspect AIM7 actually creates/deletes files and/or
renames them too. Or maybe I screwed up the dget_parent() special case
thing, which mattered because AIM7 did a lot of getcwd() calls or
someting odd like that.

Linus

Below is the perf data of my short workloads run in an 80-core DL980:

13.60% reaim [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|--48.79%-- tty_ldisc_try
|--48.58%-- tty_ldisc_deref
--2.63%-- [...]

11.31% swapper [kernel.kallsyms] [k] intel_idle
|--99.94%-- cpuidle_enter_state
--0.06%-- [...]

4.86% reaim [kernel.kallsyms] [k] lg_local_lock
|--59.41%-- mntput_no_expire
|--19.37%-- path_init
|--15.14%-- d_path
|--5.88%-- sys_getcwd
--0.21%-- [...]

3.00% reaim reaim [.] mul_short

2.41% reaim reaim [.] mul_long
|--87.21%-- 0xbc614e
--12.79%-- (nil)

2.29% reaim reaim [.] mul_int

2.20% reaim [kernel.kallsyms] [k] _raw_spin_lock
|--12.81%-- prepend_path
|--9.90%-- lockref_put_or_lock
|--9.62%-- __rcu_process_callbacks
|--8.77%-- load_balance
|--6.40%-- lockref_get
|--5.55%-- __mutex_lock_slowpath
|--4.85%-- __mutex_unlock_slowpath
|--4.83%-- inet_twsk_schedule
|--4.27%-- lockref_get_or_lock
|--2.19%-- task_rq_lock
|--2.13%-- sem_lock
|--2.09%-- scheduler_tick
|--1.88%-- try_to_wake_up
|--1.53%-- kmem_cache_free
|--1.30%-- unix_create1
|--1.22%-- unix_release_sock
|--1.21%-- process_backlog
|--1.11%-- unix_stream_sendmsg
|--1.03%-- enqueue_to_backlog
|--0.85%-- rcu_accelerate_cbs
|--0.79%-- unix_dgram_sendmsg
|--0.76%-- do_anonymous_page
|--0.70%-- unix_stream_recvmsg
|--0.69%-- unix_stream_connect
|--0.64%-- net_rx_action
|--0.61%-- tcp_v4_rcv
|--0.59%-- __do_fault
|--0.54%-- new_inode_pseudo
|--0.52%-- __d_lookup
--10.62%-- [...]

1.19% reaim [kernel.kallsyms] [k] mspin_lock
|--99.82%-- __mutex_lock_slowpath
--0.18%-- [...]

1.01% reaim [kernel.kallsyms] [k] lg_global_lock
|--51.62%-- __shmdt
--48.38%-- __shmctl

There are more contention in the lglock than I remember for the run in 3.10. This is an area that I need to look at. In fact, lglock is becoming a problem for really large machine with a lot of cores. We have a prototype 16-socket machine with 240 cores under development. The cost of doing a lg_global_lock will be very high in that type of machine given that it is already high in this 80-core machine. I have been thinking about instead of per-cpu spinlocks, we could change the locking to per-node level. While there will be more contention for lg_local_lock, the cost of doing a lg_global_lock will be much lower and contention within the local die should not be too bad. That will require either a per-node variable infrastructure or simulated with the existing per-cpu subsystem.

I will also need to look at ways reduce the need of taking d_lock in existing code. One area that I am looking at is whether we can take out the lock/unlock pair in prepend_path(). This function can only be called with the rename_lock taken. So no filename change or deletion will be allowed. It will only be a problem if somehow the dentry itself got killed or dropped while the name is being copied out. The first dentry referenced by the path structure should have a non-zero reference count, so that shouldn't happen. I am not so sure about the parents of that dentry as I am not so familiar with that part of the filesystem code.

Regards,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/