Re: [PATCH] TOMOYO: Add garbage collector support. (v3)

From: Paul E. McKenney
Date: Wed Jun 17 2009 - 12:31:37 EST


On Wed, Jun 17, 2009 at 08:19:07PM +0900, Tetsuo Handa wrote:
> Hello.
>
> This patchset adds garbage collector for TOMOYO.
> This time, I'm using some sort of RCU-like approach instead of cookie-list
> approach.
>
> TOMOYO 1/3: Move sleeping operations to outside the semaphore.
> 6 files changed, 231 insertions(+), 345 deletions(-)
>
> TOMOYO 2/3: Replace tomoyo_save_name() with tomoyo_get_name()/tomoyo_put_name().
> 5 files changed, 70 insertions(+), 23 deletions(-)
>
> TOMOYO 3/3: Add RCU-like garbage collector.
> 7 files changed, 733 insertions(+), 358 deletions(-)
>
> Paul E. McKenney wrote ( http://lkml.org/lkml/2009/5/27/2 ) :
> > I would also recommend the three-part LWN series as a starting point:
> >
> > # http://lwn.net/Articles/262464/ (What is RCU, Fundamentally?)
> > # http://lwn.net/Articles/263130/ (What is RCU's Usage?)
> > # http://lwn.net/Articles/264090/ (What is RCU's API?)
>
> I've read these articles. They are very good.

Glad that they were helpful!!!

> I came up with an idea that we may be able to implement GC while readers are
> permitted to sleep but no read locks are required.

I believe you have a bug in your pseudocode -- please see below.

> The idea is to have two counters which hold the number of readers currently
> reading the list, one is active and the other is inactive. Reader increments
> the currently active counter before starts reading and decrements that counter
> after finished reading. GC swaps active counter and inactive counter and waits
> for previously active counter's count to become 0 before releasing elements
> removed from the list.
> Code is shown below.
>
> atomic_t users_counter[2];
> atomic_t users_counter_idx;
> DEFINE_MUTEX(updator_mutex);
> DEFINE_MUTEX(gc_mutex);
>
> --- reader ---
> {
> /* Get counter index. */
> int idx = atomic_read(&users_counter_idx);
> /* Lock counter. */
> atomic_inc(&users_counter[idx]);
> list_for_each_entry_rcu() {
> ... /* Allowed to sleep. */
> }
> /* Unlock counter. */
> atomic_dec(&users_counter[idx]);
> }
>
> --- writer ---
> {
> bool found = false;
> /* Get lock for writing. */
> mutex_lock(&updater_mutex);
> list_for_each_entry_rcu() {
> if (...)
> continue;
> found = true;
> break;
> }
> if (!found)
> list_add_rcu(element);
> /* Release lock for writing. */
> mutex_unlock(&updater_mutex);
> }
>
> --- garbage collector ---
> {
> bool element_deleted = false;
> /* Protect the counters from concurrent GC threads. */
> mutex_lock(&gc_mutex);
> /* Get lock for writing. */
> mutex_lock(&updater_mutex);
> list_for_each_entry_rcu() {
> if (...)
> continue;
> list_del_rcu(element);
> element_deleted = true;
> break;
> }
> /* Release lock for writing. */
> mutex_unlock(&updater_mutex);
> if (element_deleted) {
> /* Swap active counter. */
> const int idx = atomic_read(&users_counter_idx);
> atomic_set(&users_counter_idx, idx ^ 1);
> /*
> * Wait for readers who are using previously active counter.
> * This is similar to synchronize_rcu() while this code allows
> * readers to do operations which may sleep.
> */
> while (atomic_read(&users_counter[idx]))
> msleep(1000);
> /*
> * Nobody is using previously active counter.
> * Ready to release memory of elements removed before
> * previously active counter became inactive.
> */
> kfree(element);
> }
> mutex_unlock(&gc_mutex);
> }

Consider the following sequence of events:

o CPU 0 picks up users_counter_idx int local variable idx.
Let's assume that the value is zero.

o CPU 0 is now preempted, interrupted, or otherwise delayed.

o CPU 1 starts garbage collection, finding some elements to
delete, thus setting "element_deleted" to true.

o CPU 1 continues garbage collection, inverting the value of
users_counter_idx, so that the value is now one, waiting
for the value-zero readers, and freeing up the old elements.

o CPU 0 continues execution, first atomically incrementing
users_counter[0], then traversing the list, possibly sleeping.

o CPU 2 starts a new round of garbage collection, again finding
some elements to delete, and thus again setting
"elements_deleted" to true. One of the elements deleted
is the one that CPU 0 is currently referencing while asleep.

o CPU 2 continues garbage collection, inverting the value of
users_counter_idx, so that the value is now zero, waiting
for the value-one readers, and freeing up the old elements.
Note that CPU 0 is a value-zero reader, so that CPU 2 will
not wait on it.

CPU 2 therefore kfree()s the element that CPU 0 is currently
referencing.

o CPU 0 wakes up, and suffers possibly fatal disappointment upon
attempting to reference an element that has been freed -- and,
worse yet, possibly re-allocated as some other type of
structure.

Or am I missing something in your pseudocode?

Also, if you have lots of concurrent readers, you can suffer high memory
contention on the users_counter[] array, correct?

I recommend that you look into use of SRCU in this case. There is some
documentation at http://lwn.net/Articles/202847/, with an revised
version incorporating feedback from the LWN comments available at:

http://www.rdrop.com/users/paulmck/RCU/srcu.2007.01.14a.pdf

Well, all but one of the LWN comments -- someone posted one a couple of
months ago that I just now noticed.

Anyway, the general approach would be to make changes to your code
roughly as follows:

1. replace your users_counter and users_counter_idx with a
struct srcu_struct.

2. In the reader, replace the fetch from users_counter_idx and
the atomic_inc() with srcu_read_lock().

3. In the garbage collector, replace the fetch/update of
users_counter_idx and the "while" loop with synchronize_srcu().

> In this idea, GC's kfree() call may be deferred for unknown duration, but
> defer duration will not matter if we use a dedicated kernel thread for GC.
>
> I noticed that there is QRCU in the "RCU has a Family of Wait-to-Finish APIs"
> section. My idea seems to resemble QRCU except grace periods.
> But "Availability" field is empty. Oh, what happened to QRCU?

Last I knew, it was queued up in Jens Axboe's tree, awaiting a first
user. But the approach you have above looks to me like it will do fine
with SRCU.

Or is there some reason why SRCU does not work for you?

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/