Re: [PATCH v11 08/28] tracing: Add lock-free tracing_map
From: Namhyung Kim
Date: Mon Nov 02 2015 - 02:08:14 EST
Hi Tom,
On Thu, Oct 29, 2015 at 01:35:43PM -0500, Tom Zanussi wrote:
> Hi Namhyung,
>
> On Thu, 2015-10-29 at 17:31 +0900, Namhyung Kim wrote:
> > Hi Tom,
> >
> > On Thu, Oct 22, 2015 at 01:14:12PM -0500, Tom Zanussi wrote:
> > > Add tracing_map, a special-purpose lock-free map for tracing.
> > >
> > > tracing_map is designed to aggregate or 'sum' one or more values
> > > associated with a specific object of type tracing_map_elt, which
> > > is associated by the map to a given key.
> > >
> > > It provides various hooks allowing per-tracer customization and is
> > > separated out into a separate file in order to allow it to be shared
> > > between multiple tracers, but isn't meant to be generally used outside
> > > of that context.
> > >
> > > The tracing_map implementation was inspired by lock-free map
> > > algorithms originated by Dr. Cliff Click:
> > >
> > > http://www.azulsystems.com/blog/cliff/2007-03-26-non-blocking-hashtable
> > > http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
> > >
> > > Signed-off-by: Tom Zanussi <tom.zanussi@xxxxxxxxxxxxxxx>
> > > Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@xxxxxxxxxxx>
> > > ---
> > > +/**
> > > + * tracing_map_insert - Insert key and/or retrieve val from a tracing_map
> > > + * @map: The tracing_map to insert into
> > > + * @key: The key to insert
> > > + *
> > > + * Inserts a key into a tracing_map and creates and returns a new
> > > + * tracing_map_elt for it, or if the key has already been inserted by
> > > + * a previous call, returns the tracing_map_elt already associated
> > > + * with it. When the map was created, the number of elements to be
> > > + * allocated for the map was specified (internally maintained as
> > > + * 'max_elts' in struct tracing_map), and that number of
> > > + * tracing_map_elts was created by tracing_map_init(). This is the
> > > + * pre-allocated pool of tracing_map_elts that tracing_map_insert()
> > > + * will allocate from when adding new keys. Once that pool is
> > > + * exhausted, tracing_map_insert() is useless and will return NULL to
> > > + * signal that state.
> > > + *
> > > + * This is a lock-free tracing map insertion function implementing a
> > > + * modified form of Cliff Click's basic insertion algorithm. It
> > > + * requires the table size be a power of two. To prevent any
> > > + * possibility of an infinite loop we always make the internal table
> > > + * size double the size of the requested table size (max_elts * 2).
> > > + * Likewise, we never reuse a slot or resize or delete elements - when
> > > + * we've reached max_elts entries, we simply return NULL once we've
> > > + * run out of entries. Readers can at any point in time traverse the
> > > + * tracing map and safely access the key/val pairs.
> > > + *
> > > + * Return: the tracing_map_elt pointer val associated with the key.
> > > + * If this was a newly inserted key, the val will be a newly allocated
> > > + * and associated tracing_map_elt pointer val. If the key wasn't
> > > + * found and the pool of tracing_map_elts has been exhausted, NULL is
> > > + * returned and no further insertions will succeed.
> > > + */
> > > +struct tracing_map_elt *tracing_map_insert(struct tracing_map *map, void *key)
> > > +{
> > > + u32 idx, key_hash, test_key;
> > > + struct tracing_map_entry *entry;
> > > +
> > > + key_hash = jhash(key, map->key_size, 0);
> > > + if (key_hash == 0)
> > > + key_hash = 1;
> > > + idx = key_hash >> (32 - (map->map_bits + 1));
> > > +
> > > + while (1) {
> > > + idx &= (map->map_size - 1);
> > > + entry = TRACING_MAP_ENTRY(map->map, idx);
> > > + test_key = entry->key;
> > > +
> > > + if (test_key && test_key == key_hash && entry->val &&
> > > + keys_match(key, entry->val->key, map->key_size))
> > > + return entry->val;
> > > +
> > > + if (!test_key && !cmpxchg(&entry->key, 0, key_hash)) {
> > > + struct tracing_map_elt *elt;
> > > +
> > > + elt = get_free_elt(map);
> > > + if (!elt)
> > > + break;
> > > + memcpy(elt->key, key, map->key_size);
> > > + entry->val = elt;
> > > +
> > > + return entry->val;
> > > + }
> > > + idx++;
> > > + }
> > > +
> > > + return NULL;
> > > +}
> >
> > IIUC this always insert new entry if no matching key found. And if
> > the map is full, it only fails after walking through the entries to
> > find an empty one, mark the entry with the key and call to
> > get_free_elt() returns NULL. As more key added, it worsenes the
> > problem since more entries will be marked with no value IMHO.
> >
> > I can see you checked hist_data->drops in the next patch to work
> > around this problem. But IMHO it's suboptimal since it cannot update
> > the existing entries too. I think it'd be better having lookup-only
> > version of this function and use it after it sees drops. The lookup
> > function can bail out from the loop if the insert doesn't mark empty
> > entry anymore IMHO.
> >
> > Thoughts?
> >
>
> The assumption has always been that once you have drops (i.e.
> tracing_map_insert() returns NULL), the data can't really be trusted any
> more and tracing should just stop (and presumably be restarted with a
> bigger table). It doesn't mean that the data is completely useless,
> just that it no longer can be assumed to have captured all the events
> over the tracing run. Having a lookup-only version for the purpose of
> updating only existing entries sort of illustrates the problem even
> better - in that case only the events that already have entries in the
> table will be included while the events that don't yet have entries will
> be ignored, skewing the value of the data.
I thought it'd be better if users can see which one is the real drop
or not. IOW if drop count is much smaller than the normal event
count, [s]he might want to ignore the occasional drops. Otherwise,
[s]he should restart with a bigger table. This requires accurate
counts of events and drops though.
>
> On the other hand, if users do end up calling this even after it's
> returned NULL, we should make sure it doesn't result in an infinite
> loop, and cap the number of iterations through the loop. tracing_map
> wasn't really meant to be generally reusable - it was separated out so
> it could be shared between two tracers - but it wouldn't hurt to add
> that check just in case...
Right.
Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/