Re: [PATCH v11 08/28] tracing: Add lock-free tracing_map

From: Tom Zanussi
Date: Thu Oct 29 2015 - 14:36:29 EST


Hi Namhyung,

On Thu, 2015-10-29 at 17:31 +0900, Namhyung Kim wrote:
> Hi Tom,
>
> On Thu, Oct 22, 2015 at 01:14:12PM -0500, Tom Zanussi wrote:
> > Add tracing_map, a special-purpose lock-free map for tracing.
> >
> > tracing_map is designed to aggregate or 'sum' one or more values
> > associated with a specific object of type tracing_map_elt, which
> > is associated by the map to a given key.
> >
> > It provides various hooks allowing per-tracer customization and is
> > separated out into a separate file in order to allow it to be shared
> > between multiple tracers, but isn't meant to be generally used outside
> > of that context.
> >
> > The tracing_map implementation was inspired by lock-free map
> > algorithms originated by Dr. Cliff Click:
> >
> > http://www.azulsystems.com/blog/cliff/2007-03-26-non-blocking-hashtable
> > http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
> >
> > Signed-off-by: Tom Zanussi <tom.zanussi@xxxxxxxxxxxxxxx>
> > Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@xxxxxxxxxxx>
> > ---
>
> [SNIP]
> > +void tracing_map_array_free(struct tracing_map_array *a)
> > +{
> > + unsigned int i;
> > +
> > + if (!a->pages)
> > + return;
>
> You need to free 'a' anyway.. Or did you want to check 'a' being NULL?
>

Both, I think. Thanks for pointing this out, will fix in the next
version.

>
> > +
> > + for (i = 0; i < a->n_pages; i++)
> > + free_page((unsigned long)a->pages[i]);
> > +
> > + kfree(a);
> > +}
> > +
> > +struct tracing_map_array *tracing_map_array_alloc(unsigned int n_elts,
> > + unsigned int entry_size)
> > +{
> > + struct tracing_map_array *a;
> > + unsigned int i;
> > +
> > + a = kzalloc(sizeof(*a), GFP_KERNEL);
> > + if (!a)
> > + return NULL;
> > +
> > + a->entry_size_shift = fls(roundup_pow_of_two(entry_size) - 1);
> > + a->entries_per_page = PAGE_SIZE / (1 << a->entry_size_shift);
> > + a->n_pages = n_elts / a->entries_per_page;
> > + if (!a->n_pages)
> > + a->n_pages = 1;
> > + a->entry_shift = fls(a->entries_per_page) - 1;
> > + a->entry_mask = (1 << a->entry_shift) - 1;
> > +
> > + a->pages = kcalloc(a->n_pages, sizeof(void *), GFP_KERNEL);
> > + if (!a->pages)
> > + goto free;
> > +
> > + for (i = 0; i < a->n_pages; i++) {
> > + a->pages[i] = (void *)get_zeroed_page(GFP_KERNEL);
> > + if (!a->pages)
>
> if (!a->pages[i])
>

And this obvious error too, thanks.

>
> > + goto free;
> > + }
> > + out:
> > + return a;
> > + free:
> > + tracing_map_array_free(a);
> > + a = NULL;
> > +
> > + goto out;
> > +}
> > +
>
> [SNIP]
> > +/**
> > + * tracing_map_insert - Insert key and/or retrieve val from a tracing_map
> > + * @map: The tracing_map to insert into
> > + * @key: The key to insert
> > + *
> > + * Inserts a key into a tracing_map and creates and returns a new
> > + * tracing_map_elt for it, or if the key has already been inserted by
> > + * a previous call, returns the tracing_map_elt already associated
> > + * with it. When the map was created, the number of elements to be
> > + * allocated for the map was specified (internally maintained as
> > + * 'max_elts' in struct tracing_map), and that number of
> > + * tracing_map_elts was created by tracing_map_init(). This is the
> > + * pre-allocated pool of tracing_map_elts that tracing_map_insert()
> > + * will allocate from when adding new keys. Once that pool is
> > + * exhausted, tracing_map_insert() is useless and will return NULL to
> > + * signal that state.
> > + *
> > + * This is a lock-free tracing map insertion function implementing a
> > + * modified form of Cliff Click's basic insertion algorithm. It
> > + * requires the table size be a power of two. To prevent any
> > + * possibility of an infinite loop we always make the internal table
> > + * size double the size of the requested table size (max_elts * 2).
> > + * Likewise, we never reuse a slot or resize or delete elements - when
> > + * we've reached max_elts entries, we simply return NULL once we've
> > + * run out of entries. Readers can at any point in time traverse the
> > + * tracing map and safely access the key/val pairs.
> > + *
> > + * Return: the tracing_map_elt pointer val associated with the key.
> > + * If this was a newly inserted key, the val will be a newly allocated
> > + * and associated tracing_map_elt pointer val. If the key wasn't
> > + * found and the pool of tracing_map_elts has been exhausted, NULL is
> > + * returned and no further insertions will succeed.
> > + */
> > +struct tracing_map_elt *tracing_map_insert(struct tracing_map *map, void *key)
> > +{
> > + u32 idx, key_hash, test_key;
> > + struct tracing_map_entry *entry;
> > +
> > + key_hash = jhash(key, map->key_size, 0);
> > + if (key_hash == 0)
> > + key_hash = 1;
> > + idx = key_hash >> (32 - (map->map_bits + 1));
> > +
> > + while (1) {
> > + idx &= (map->map_size - 1);
> > + entry = TRACING_MAP_ENTRY(map->map, idx);
> > + test_key = entry->key;
> > +
> > + if (test_key && test_key == key_hash && entry->val &&
> > + keys_match(key, entry->val->key, map->key_size))
> > + return entry->val;
> > +
> > + if (!test_key && !cmpxchg(&entry->key, 0, key_hash)) {
> > + struct tracing_map_elt *elt;
> > +
> > + elt = get_free_elt(map);
> > + if (!elt)
> > + break;
> > + memcpy(elt->key, key, map->key_size);
> > + entry->val = elt;
> > +
> > + return entry->val;
> > + }
> > + idx++;
> > + }
> > +
> > + return NULL;
> > +}
>
> IIUC this always insert new entry if no matching key found. And if
> the map is full, it only fails after walking through the entries to
> find an empty one, mark the entry with the key and call to
> get_free_elt() returns NULL. As more key added, it worsenes the
> problem since more entries will be marked with no value IMHO.
>
> I can see you checked hist_data->drops in the next patch to work
> around this problem. But IMHO it's suboptimal since it cannot update
> the existing entries too. I think it'd be better having lookup-only
> version of this function and use it after it sees drops. The lookup
> function can bail out from the loop if the insert doesn't mark empty
> entry anymore IMHO.
>
> Thoughts?
>

The assumption has always been that once you have drops (i.e.
tracing_map_insert() returns NULL), the data can't really be trusted any
more and tracing should just stop (and presumably be restarted with a
bigger table). It doesn't mean that the data is completely useless,
just that it no longer can be assumed to have captured all the events
over the tracing run. Having a lookup-only version for the purpose of
updating only existing entries sort of illustrates the problem even
better - in that case only the events that already have entries in the
table will be included while the events that don't yet have entries will
be ignored, skewing the value of the data.

On the other hand, if users do end up calling this even after it's
returned NULL, we should make sure it doesn't result in an infinite
loop, and cap the number of iterations through the loop. tracing_map
wasn't really meant to be generally reusable - it was separated out so
it could be shared between two tracers - but it wouldn't hurt to add
that check just in case...

Thanks,

Tom


> Thanks,
> Namhyung


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/