Re: [BUG] Something goes wrong with timer statistics.

From: Eric Dumazet
Date: Thu May 31 2007 - 11:10:40 EST


On Thu, 31 May 2007 16:25:22 +0200
Björn Steinbrink <B.Steinbrink@xxxxxx> wrote:

> On 2007.05.31 12:20:47 +0200, iank@xxxxxxxxxxxx wrote:
> > On Wed, May 30, 2007 at 03:14:58PM +0200, Björn Steinbrink wrote:
> > > Initialize the "next" field of a timer stats entry before it is inserted
> > > into the list to avoid a race with the fastpath lookup.
> >
> > Sorry to say... but this does not fix my problem, however, i can't reach
> > that machine at all atm, but i will post the oops later today.
>
> Ok, three times is a charm, right? ;-) The previous patch did fix a
> race, but there's still one left. The hash table is never cleared, so it
> can still point into the entries array.
>
> As long as we do not race with the insertion, that's not a directly
> visible problem as all fields are 0, so we get no match and the lookup
> finishes right away. We probably _can_ get weird results though, as the
> all-zero entry isn't marked as used, but is used as "head" and "prev" in
> this case and even multiple hash entries might point to it in some
> cases. That would eventually lead to lost entries because the "next"
> field gets overwritten when alloc_entry() finally claims the entry.
>
> But when we race, we can still end up between "*curr = *entry" (makes
> "next" contain garbage) and "curr->next = NULL", while we're doing the
> fastpath lookup, because the insertion basically already took place!
> Replacing "curr->next = NULL" with "entry->next = NULL" and moving it up
> is pointless, as "*curr = *entry" isn't atomic, so instead, we have to
> simply clean the hash table when the entries are cleared.
>
> Björn
>
>
> Fix two races in the timer stats lookup code. One by ensuring that the
> initialization of a new entry is finished upon insertion of that entry.
> The other by cleaning up the hash table when the entries array is
> cleared, so that we don't have "pre-inserted" entries.
>
> Thanks to Eric Dumazet for reminding me of the memory barrier.
>
> Signed-off-by: Björn Steinbrink <B.Steinbrink@xxxxxx>
> ---
> diff --git a/kernel/time/timer_stats.c b/kernel/time/timer_stats.c
> index 868f1bc..611b844 100644
> --- a/kernel/time/timer_stats.c
> +++ b/kernel/time/timer_stats.c
> @@ -117,21 +117,6 @@ static struct entry entries[MAX_ENTRIES];
>
> static atomic_t overflow_count;
>
> -static void reset_entries(void)
> -{
> - nr_entries = 0;
> - memset(entries, 0, sizeof(entries));
> - atomic_set(&overflow_count, 0);
> -}
> -
> -static struct entry *alloc_entry(void)
> -{
> - if (nr_entries >= MAX_ENTRIES)
> - return NULL;
> -
> - return entries + nr_entries++;
> -}
> -
> /*
> * The entries are in a hash-table, for fast lookup:
> */
> @@ -149,6 +134,22 @@ static struct entry *alloc_entry(void)
>
> static struct entry *tstat_hash_table[TSTAT_HASH_SIZE] __read_mostly;
>
> +static void reset_entries(void)
> +{
> + nr_entries = 0;
> + memset(entries, 0, sizeof(entries));
> + memset(tstat_hash_table, 0, sizeof(tstat_hash_table));
> + atomic_set(&overflow_count, 0);
> +}
> +
> +static struct entry *alloc_entry(void)
> +{
> + if (nr_entries >= MAX_ENTRIES)
> + return NULL;
> +
> + return entries + nr_entries++;
> +}
> +
> static int match_entries(struct entry *entry1, struct entry *entry2)
> {
> return entry1->timer == entry2->timer &&
> @@ -202,12 +203,15 @@ static struct entry *tstat_lookup(struct entry *entry, char *comm)
> if (curr) {
> *curr = *entry;
> curr->count = 0;
> + curr->next = NULL;
> memcpy(curr->comm, comm, TASK_COMM_LEN);
> +
> + smp_mb(); /* Ensure that curr is initialized before insert */
> +
> if (prev)
> prev->next = curr;
> else
> *head = curr;
> - curr->next = NULL;
> }
> out_unlock:
> spin_unlock(&table_lock);
> -

Well... :) , there is still a memory barrier missing it seems.

Another cpu might see a bad value if 'active=1' is set before tstat_hash_table is really cleared.


diff -urp linux/kernel/time/timer_stats.c.orig linux/kernel/time/timer_stats.c
--- kernel/time/timer_stats.c.orig
+++ kernel/time/timer_stats.c
@@ -361,6 +361,7 @@ static ssize_t tstats_write(struct file
if (!active) {
reset_entries();
time_start = ktime_get();
+ smp_mb();
active = 1;
}
break;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/