[RFC 0/6] Track RCU dereferences in RCU read-side critical sections

From: Boqun Feng
Date: Wed Feb 03 2016 - 11:46:28 EST


As a characteristic of RCU, read-side critical sections have a very
loose connection with rcu_dereference()s, which is you can only be sure
about an rcu_dereference() might be called in some read-side critical
section, but if code gets complex, you may not be sure which read-side
critical section exactly, this might be also an problem for some other
locking mechanisms, that is the critical sections protecting data and
the data accesses protected are not clearly correlated.

In this series, we are introducing LOCKED_ACCESS framework and based on
which, we implement the RCU_LOCKED_ACCESS functionality to give us a
clear hint: which rcu_dereference() happens in which RCU read-side
critical section.

The basic idea of LOCKED_ACCESS is to maintain a chain of locks we have
acquired already, and when there happens a data access, correlate the
data access with the chain.

Lockdep already has lock chains, but we introduce a new but similar one
concept: acqchain, an acqchain is similar to a lock chain, except that
the key of an acqchain is the hash sum of the acquire (instruction)
positions of the locks in the chain, whereas the key of a lock chain is
the hash sum of the class keys of the locks in the chain.

Acqchains are introduced because we want to correlate data accesses with
critical sections and critical sections are better represented by the
acquire positions rather than lock classes.

The acqchain key of a task is maintained in the same way as lock chain
keys in lockdep.

Similar as lockdep, LOCKED_ACCESS also classify locks and data accesses
by groups, locked access class is introduced for this reason. A locked
access class also contains the data for allocation and lookup of
acqchains and accesses, and the address of a locked access class is used
as its key. By tagging locks and data accesses with the keys, we could
describe which locks and data accesses are related.

The entry point of LOCKED_ACCESS is locked_access_point(). Calling
locked_access_point() indicates that a data access happens, and after it
called the data access will be correlated with the current acqchain.

We also provide a /proc filesystem interface to show the information
we've collected, for each locked access class with the name <name> there
will be a file at /proc/locked_access/<name> showing all the
relationships collected so far for this locked access classes.

Based on LOCKED_ACCESS, we implement RCU_LOCKED_ACCESS, that tracks
rcu_dereference()s inside RCU read-side critical sections.

This patchset is based on v4.5-rc2 and consists of 6 patches(in which
patch 2-5 are the implementation of LOCKED_ACCESS):

1. Introduce some functions of irq_context.

2. Introduce locked access class and acqchain.

3. Maintain the keys of acqchains.

4. Introduce the entry point of LOCKED_ACCESS.

5. Add proc interface for locked access class

6. Enables LOCKED_ACCESS for RCU.

Tested by 0day and I also did a simple test on x86: build and boot a
kernel with RCU_LOCKED_ACCESS=y and CONFIG_PROVE_LOCKING=y and ran
several workloads(kernel building, git cloning, dbench), no problem has
been observed, and /proc/locked_access/rcu was able to collect the
relationships between ~300 RCU read-critical sections and ~500
rcu_dereference*().

Snippets of /proc/locked_access/rcu are as follow:

...(this rcu_dereference() happens after one rcu_read_lock())
...
ACQCHAIN 0xfdbf0c6aeea, 1 locks, irq_context 0:
LOCK at [<ffffffff812b1115>] get_proc_task_net+0x5/0x140
ACCESS TYPE 1 at kernel/pid.c:441
...
...(this rcu_dereference() happens after three rcu_read_lock())
...
ACQCHAIN 0xfe042af3bbfb2605, 3 locks, irq_context 0:
LOCK at [<ffffffff81094b47>] SyS_kill+0x97/0x2a0
LOCK at [<ffffffff8109286f>] kill_pid_info+0x1f/0x140
LOCK at [<ffffffff81092605>] group_send_sig_info+0x5/0x130
ACCESS TYPE 1 at kernel/signal.c:695
...

Looking forwards to any suggestion, comment and question ;-)

Regards,
Boqun