Re: [PATCH] TOMOYO: Add garbage collector support. (v3)

From: Paul E. McKenney
Date: Thu Jun 18 2009 - 11:29:08 EST

Next message: Jeff Garzik: "Re: [osd-dev] [PATCHSET 0/6] exofs: few patches for Linux 2.6.31"
Previous message: Magnus Damm: "[PATCH] clocksource: save mult_orig in clocksource_disable()"
In reply to: Paul E. McKenney: "Re: [PATCH 3/3] TOMOYO: Add SRCU based garbage collector."
Next in thread: Tetsuo Handa: "Re: [PATCH] TOMOYO: Add garbage collector support. (v3)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Jun 18, 2009 at 02:34:42PM +0900, Tetsuo Handa wrote:
> Hello.
>
> Paul E. McKenney wrote:
> > Consider the following sequence of events:
> >
> > o CPU 0 picks up users_counter_idx int local variable idx.
> > Let's assume that the value is zero.
> >
> > o CPU 0 is now preempted, interrupted, or otherwise delayed.
> >
> > o CPU 1 starts garbage collection, finding some elements to
> > delete, thus setting "element_deleted" to true.
> >
> > o CPU 1 continues garbage collection, inverting the value of
> > users_counter_idx, so that the value is now one, waiting
> > for the value-zero readers, and freeing up the old elements.

[1]

> > o CPU 0 continues execution, first atomically incrementing
> > users_counter[0], then traversing the list, possibly sleeping.
> >
> > o CPU 2 starts a new round of garbage collection, again finding
> > some elements to delete, and thus again setting
> > "elements_deleted" to true. One of the elements deleted
> > is the one that CPU 0 is currently referencing while asleep.
> >
> No. CPU 2 can't start a new round of GC because GC function is exclusively
> executed because of gc_mutex mutex.

But CPU 1 would have released gc_mutex back at time [1], right?

> > o CPU 2 continues garbage collection, inverting the value of
> > users_counter_idx, so that the value is now zero, waiting
> > for the value-one readers, and freeing up the old elements.
> > Note that CPU 0 is a value-zero reader, so that CPU 2 will
> > not wait on it.
> >
> > CPU 2 therefore kfree()s the element that CPU 0 is currently
> > referencing.
> >
> CPU 2 won't continue GC, for CPU 2 can't start a new round of GC.

I still don't see why CPU 0 would not have released gc_mutex back
at point [1].

> > o CPU 0 wakes up, and suffers possibly fatal disappointment upon
> > attempting to reference an element that has been freed -- and,
> > worse yet, possibly re-allocated as some other type of
> > structure.
> >
> CPU 0 won't suffer, for first round of GC (by CPU 1) prevents CPU 2 from
> starting a new round of GC.

Why would CPU 1 be unable to complete its round of GC, thus releasing
gc_mutex, thus allowing CPU 2 from starting a new one? For that matter,
CPU 1 could start a new round, correct?

> > Or am I missing something in your pseudocode?
> I think you missed that GC function is executed exclusively.
>
> The race between readers and GC is avoided as below.

If you can tell me why CPU 1 cannot release gc_mutex, I will look at
the following. Until then, I will stand by my scenario above. ;-)

> (a-1) A reader reads users_counter_idx and saves to r_idx
> (a-2) GC removes element from the list using RCU
> (a-3) GC reads users_counter_idx and saves to g_idx
> (a-4) GC inverts users_counter_idx
> (a-5) GC releases the removed element
> (a-6) A reader increments users_counter[r_idx]
> (a-7) A reader won't see the element removed by GC because
> the reader has not started list traversal as of (a-2)
>
> (b-1) A reader reads users_counter_idx and saves to r_idx
> (b-2) A reader increments users_counter[r_idx]
> (b-3) GC removes element from the list using RCU
> (b-4) A reader won't see the element removed by GC
> (b-5) GC reads users_counter_idx and saves to g_idxo
> (b-6) GC inverts users_counter_idx
> (b-7) GC waits for users_counter[g_idx] to become 0
> (b-8) A reader decrements users_counter[r_idx]
> (b-9) GC releases the removed element
>
> (c-1) A reader reads users_counter_idx and saves to r_idx
> (c-2) A reader increments users_counter[r_idx]
> (c-3) A reader sees the element
> (c-4) GC removes element from the list using RCU
> (c-5) GC reads users_counter_idx and saves to g_idx
> (c-6) GC inverts users_counter_idx
> (c-7) GC waits for users_counter[g_idx] to become 0
> (c-8) A reader decrements users_counter[r_idx]
> (c-9) GC releases the removed element
>
> What I worry is that some memory barriers might be needed between
>
> > > {
> > > /* Get counter index. */
> > > int idx = atomic_read(&users_counter_idx);
> > > /* Lock counter. */
> > > atomic_inc(&users_counter[idx]);
> - here -
> > > list_for_each_entry_rcu() {
> > > ... /* Allowed to sleep. */
> > > }
> - here -
> > > /* Unlock counter. */
> > > atomic_dec(&users_counter[idx]);
> > > }
>
> and
>
> > > if (element_deleted) {
> > > /* Swap active counter. */
> > > const int idx = atomic_read(&users_counter_idx);
> - here -
> > > atomic_set(&users_counter_idx, idx ^ 1);
> - here -
> > > /*
> > > * Wait for readers who are using previously active counter.
> > > * This is similar to synchronize_rcu() while this code allows
> > > * readers to do operations which may sleep.
> > > */
> > > while (atomic_read(&users_counter[idx]))
> > > msleep(1000);
> > > /*
> > > * Nobody is using previously active counter.
> > > * Ready to release memory of elements removed before
> > > * previously active counter became inactive.
> > > */
> > > kfree(element);
> > > }
>
> in order to enforce ordering.

Quite possibly. One of the advantages of using things like SRCU is that
the necessary memory barriers are already in place. (knock wood)

> > Also, if you have lots of concurrent readers, you can suffer high memory
> > contention on the users_counter[] array, correct?
>
> Excuse me. I couldn't understand "memory contention"...
>
> ( http://www.answers.com/topic/memory-contention )
> | A situation in which two different programs, or two parts of a program,
> | try to read items in the same block of memory at the same time.
> Why suffered by atomic_read() at the same time?
> Cache invalidation by atomic_inc()/atomic_dec() a shared variable?

Yes, cache invalidation by atomic_inc()/atomic_dec() of a shared
variable can definitly result in memory contention and extremely
bad performance.

> ( http://wiki.answers.com/Q/What_is_memory_contention )
> | Memory contention is a state a OS memory manager can reside in when to many
> | memory requests (alloc, realloc, free) are issued to it from an active
> | application possibly leading to a DOS condition specific to that
> | application.
> No memory allocation for users_counter[] array.

This is not the type of memory contention I was thinking of.

> > I recommend that you look into use of SRCU in this case.
>
> I have one worry regarding SRCU.
> Inside synchronize_srcu(), there is a loop
>
> while (srcu_readers_active_idx(sp, idx))
> schedule_timeout_interruptible(1);
>
> but the reader's sleeping duration varies from less than one second to
> more than hours.
>
> Checking for counters for every jiffies sounds too much waste of CPU.
> Delaying kfree() for seconds or minutes won't cause troubles for TOMOYO.
> It would be nice if checking interval is configurable like
> "schedule_timeout_interruptible(sp->timeout);".

This would not be a difficult change, and certainly would make sense in
your case. I would be happy to review a patch from you, or to create a
patch to SRCU if you would prefer.

I would add a field as you say, and add an API element:

void set_srcu_timeout(struct srcu_struct *sp, unsigned long timeout)

The timeout would default to 1, and a value of 0 would be interpreted
as 1. In your case, perhaps you would do the following after initializing
the srcu_struct:

set_srcu_timeout(&tomoyo_srcu, HZ);

Would this work?

> > Anyway, the general approach would be to make changes to your code
> > roughly as follows:
> >
> > 1. replace your users_counter and users_counter_idx with a
> > struct srcu_struct.
> >
> > 2. In the reader, replace the fetch from users_counter_idx and
> > the atomic_inc() with srcu_read_lock().
> >
> > 3. In the garbage collector, replace the fetch/update of
> > users_counter_idx and the "while" loop with synchronize_srcu().
> >
> I see. Since I isolated the GC as a dedicated kernel thread, writers no longer
> wait for elements to be kfree()ed. I can use SRCU.

Very good!

> > Or is there some reason why SRCU does not work for you?
> None for mainline version.
>
> I'm also maintaining TOMOYO for older/distributor kernels for those who want to
> enable both SELinux/SMACK/AppArmor/grsecurity etc. and TOMOYO at the same time.
> Thus, if my idea works, I want to backport it to TOMOYO for these kernels.

SRCU has been in the kernel since 2.6.18, but would be easy to
backport. If you have a recent git tree, run "gitk kernel/srcu.c
include/linux/srcu.h". You will find four commits that are pretty
well isolated -- you would need the changes to kernel/srcu.c,
include/linux/srcu.h, and kernel/Makefile.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jeff Garzik: "Re: [osd-dev] [PATCHSET 0/6] exofs: few patches for Linux 2.6.31"
Previous message: Magnus Damm: "[PATCH] clocksource: save mult_orig in clocksource_disable()"
In reply to: Paul E. McKenney: "Re: [PATCH 3/3] TOMOYO: Add SRCU based garbage collector."
Next in thread: Tetsuo Handa: "Re: [PATCH] TOMOYO: Add garbage collector support. (v3)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]