Re: [RFC 0/6] Track RCU dereferences in RCU read-side critical sections

From: Paul E. McKenney
Date: Mon Feb 15 2016 - 13:03:59 EST


On Thu, Feb 04, 2016 at 12:45:06AM +0800, Boqun Feng wrote:
> As a characteristic of RCU, read-side critical sections have a very
> loose connection with rcu_dereference()s, which is you can only be sure
> about an rcu_dereference() might be called in some read-side critical
> section, but if code gets complex, you may not be sure which read-side
> critical section exactly, this might be also an problem for some other
> locking mechanisms, that is the critical sections protecting data and
> the data accesses protected are not clearly correlated.

Seeing no objections, I am queuing this series for review and testing.
The information it provides would have been extremely helpful to me
several times in past!

Thanx, Paul

> In this series, we are introducing LOCKED_ACCESS framework and based on
> which, we implement the RCU_LOCKED_ACCESS functionality to give us a
> clear hint: which rcu_dereference() happens in which RCU read-side
> critical section.
>
> The basic idea of LOCKED_ACCESS is to maintain a chain of locks we have
> acquired already, and when there happens a data access, correlate the
> data access with the chain.
>
> Lockdep already has lock chains, but we introduce a new but similar one
> concept: acqchain, an acqchain is similar to a lock chain, except that
> the key of an acqchain is the hash sum of the acquire (instruction)
> positions of the locks in the chain, whereas the key of a lock chain is
> the hash sum of the class keys of the locks in the chain.
>
> Acqchains are introduced because we want to correlate data accesses with
> critical sections and critical sections are better represented by the
> acquire positions rather than lock classes.
>
> The acqchain key of a task is maintained in the same way as lock chain
> keys in lockdep.
>
> Similar as lockdep, LOCKED_ACCESS also classify locks and data accesses
> by groups, locked access class is introduced for this reason. A locked
> access class also contains the data for allocation and lookup of
> acqchains and accesses, and the address of a locked access class is used
> as its key. By tagging locks and data accesses with the keys, we could
> describe which locks and data accesses are related.
>
> The entry point of LOCKED_ACCESS is locked_access_point(). Calling
> locked_access_point() indicates that a data access happens, and after it
> called the data access will be correlated with the current acqchain.
>
> We also provide a /proc filesystem interface to show the information
> we've collected, for each locked access class with the name <name> there
> will be a file at /proc/locked_access/<name> showing all the
> relationships collected so far for this locked access classes.
>
> Based on LOCKED_ACCESS, we implement RCU_LOCKED_ACCESS, that tracks
> rcu_dereference()s inside RCU read-side critical sections.
>
> This patchset is based on v4.5-rc2 and consists of 6 patches(in which
> patch 2-5 are the implementation of LOCKED_ACCESS):
>
> 1. Introduce some functions of irq_context.
>
> 2. Introduce locked access class and acqchain.
>
> 3. Maintain the keys of acqchains.
>
> 4. Introduce the entry point of LOCKED_ACCESS.
>
> 5. Add proc interface for locked access class
>
> 6. Enables LOCKED_ACCESS for RCU.
>
> Tested by 0day and I also did a simple test on x86: build and boot a
> kernel with RCU_LOCKED_ACCESS=y and CONFIG_PROVE_LOCKING=y and ran
> several workloads(kernel building, git cloning, dbench), no problem has
> been observed, and /proc/locked_access/rcu was able to collect the
> relationships between ~300 RCU read-critical sections and ~500
> rcu_dereference*().
>
> Snippets of /proc/locked_access/rcu are as follow:
>
> ...(this rcu_dereference() happens after one rcu_read_lock())
> ...
> ACQCHAIN 0xfdbf0c6aeea, 1 locks, irq_context 0:
> LOCK at [<ffffffff812b1115>] get_proc_task_net+0x5/0x140
> ACCESS TYPE 1 at kernel/pid.c:441
> ...
> ...(this rcu_dereference() happens after three rcu_read_lock())
> ...
> ACQCHAIN 0xfe042af3bbfb2605, 3 locks, irq_context 0:
> LOCK at [<ffffffff81094b47>] SyS_kill+0x97/0x2a0
> LOCK at [<ffffffff8109286f>] kill_pid_info+0x1f/0x140
> LOCK at [<ffffffff81092605>] group_send_sig_info+0x5/0x130
> ACCESS TYPE 1 at kernel/signal.c:695
> ...
>
> Looking forwards to any suggestion, comment and question ;-)
>
> Regards,
> Boqun
>