Re: [PATCH v11 08/28] tracing: Add lock-free tracing_map
From: Tom Zanussi
Date: Tue Nov 03 2015 - 20:47:59 EST
Hi Namhyung,
On Mon, 2015-11-02 at 16:08 +0900, Namhyung Kim wrote:
> Hi Tom,
>
> On Thu, Oct 29, 2015 at 01:35:43PM -0500, Tom Zanussi wrote:
> > Hi Namhyung,
> >
> > On Thu, 2015-10-29 at 17:31 +0900, Namhyung Kim wrote:
> > > Hi Tom,
> > >
> > > On Thu, Oct 22, 2015 at 01:14:12PM -0500, Tom Zanussi wrote:
> > > > Add tracing_map, a special-purpose lock-free map for tracing.
> > > >
> > > > tracing_map is designed to aggregate or 'sum' one or more values
> > > > associated with a specific object of type tracing_map_elt, which
> > > > is associated by the map to a given key.
> > > >
> > > > It provides various hooks allowing per-tracer customization and is
> > > > separated out into a separate file in order to allow it to be shared
> > > > between multiple tracers, but isn't meant to be generally used outside
> > > > of that context.
> > > >
> > > > The tracing_map implementation was inspired by lock-free map
> > > > algorithms originated by Dr. Cliff Click:
> > > >
> > > > http://www.azulsystems.com/blog/cliff/2007-03-26-non-blocking-hashtable
> > > > http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
> > > >
> > > > Signed-off-by: Tom Zanussi <tom.zanussi@xxxxxxxxxxxxxxx>
> > > > Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@xxxxxxxxxxx>
> > > > ---
> > > > +/**
> > > > + * tracing_map_insert - Insert key and/or retrieve val from a tracing_map
> > > > + * @map: The tracing_map to insert into
> > > > + * @key: The key to insert
> > > > + *
> > > > + * Inserts a key into a tracing_map and creates and returns a new
> > > > + * tracing_map_elt for it, or if the key has already been inserted by
> > > > + * a previous call, returns the tracing_map_elt already associated
> > > > + * with it. When the map was created, the number of elements to be
> > > > + * allocated for the map was specified (internally maintained as
> > > > + * 'max_elts' in struct tracing_map), and that number of
> > > > + * tracing_map_elts was created by tracing_map_init(). This is the
> > > > + * pre-allocated pool of tracing_map_elts that tracing_map_insert()
> > > > + * will allocate from when adding new keys. Once that pool is
> > > > + * exhausted, tracing_map_insert() is useless and will return NULL to
> > > > + * signal that state.
> > > > + *
> > > > + * This is a lock-free tracing map insertion function implementing a
> > > > + * modified form of Cliff Click's basic insertion algorithm. It
> > > > + * requires the table size be a power of two. To prevent any
> > > > + * possibility of an infinite loop we always make the internal table
> > > > + * size double the size of the requested table size (max_elts * 2).
> > > > + * Likewise, we never reuse a slot or resize or delete elements - when
> > > > + * we've reached max_elts entries, we simply return NULL once we've
> > > > + * run out of entries. Readers can at any point in time traverse the
> > > > + * tracing map and safely access the key/val pairs.
> > > > + *
> > > > + * Return: the tracing_map_elt pointer val associated with the key.
> > > > + * If this was a newly inserted key, the val will be a newly allocated
> > > > + * and associated tracing_map_elt pointer val. If the key wasn't
> > > > + * found and the pool of tracing_map_elts has been exhausted, NULL is
> > > > + * returned and no further insertions will succeed.
> > > > + */
> > > > +struct tracing_map_elt *tracing_map_insert(struct tracing_map *map, void *key)
> > > > +{
> > > > + u32 idx, key_hash, test_key;
> > > > + struct tracing_map_entry *entry;
> > > > +
> > > > + key_hash = jhash(key, map->key_size, 0);
> > > > + if (key_hash == 0)
> > > > + key_hash = 1;
> > > > + idx = key_hash >> (32 - (map->map_bits + 1));
> > > > +
> > > > + while (1) {
> > > > + idx &= (map->map_size - 1);
> > > > + entry = TRACING_MAP_ENTRY(map->map, idx);
> > > > + test_key = entry->key;
> > > > +
> > > > + if (test_key && test_key == key_hash && entry->val &&
> > > > + keys_match(key, entry->val->key, map->key_size))
> > > > + return entry->val;
> > > > +
> > > > + if (!test_key && !cmpxchg(&entry->key, 0, key_hash)) {
> > > > + struct tracing_map_elt *elt;
> > > > +
> > > > + elt = get_free_elt(map);
> > > > + if (!elt)
> > > > + break;
> > > > + memcpy(elt->key, key, map->key_size);
> > > > + entry->val = elt;
> > > > +
> > > > + return entry->val;
> > > > + }
> > > > + idx++;
> > > > + }
> > > > +
> > > > + return NULL;
> > > > +}
> > >
> > > IIUC this always insert new entry if no matching key found. And if
> > > the map is full, it only fails after walking through the entries to
> > > find an empty one, mark the entry with the key and call to
> > > get_free_elt() returns NULL. As more key added, it worsenes the
> > > problem since more entries will be marked with no value IMHO.
> > >
> > > I can see you checked hist_data->drops in the next patch to work
> > > around this problem. But IMHO it's suboptimal since it cannot update
> > > the existing entries too. I think it'd be better having lookup-only
> > > version of this function and use it after it sees drops. The lookup
> > > function can bail out from the loop if the insert doesn't mark empty
> > > entry anymore IMHO.
> > >
> > > Thoughts?
> > >
> >
> > The assumption has always been that once you have drops (i.e.
> > tracing_map_insert() returns NULL), the data can't really be trusted any
> > more and tracing should just stop (and presumably be restarted with a
> > bigger table). It doesn't mean that the data is completely useless,
> > just that it no longer can be assumed to have captured all the events
> > over the tracing run. Having a lookup-only version for the purpose of
> > updating only existing entries sort of illustrates the problem even
> > better - in that case only the events that already have entries in the
> > table will be included while the events that don't yet have entries will
> > be ignored, skewing the value of the data.
>
> I thought it'd be better if users can see which one is the real drop
> or not. IOW if drop count is much smaller than the normal event
> count, [s]he might want to ignore the occasional drops. Otherwise,
> [s]he should restart with a bigger table. This requires accurate
> counts of events and drops though.
>
OK, how about the below - it basically moves the drops set/test/inc into
tracing_map_insert(), as well as a total hits count. So those values
will be available for users to use in deciding whether to use the data
or restart with a bigger table, and the loop is bailed out of only if no
matching keys are found and there are drops, so callers can continue
updating existing entries.
Users who want the original behavior still get the NULL return and can
stop calling tracing_map_insert() as before:
struct tracing_map_elt *tracing_map_insert(struct tracing_map *map, void *key)
{
u32 idx, key_hash, test_key;
struct tracing_map_entry *entry;
key_hash = jhash(key, map->key_size, 0);
if (key_hash == 0)
key_hash = 1;
idx = key_hash >> (32 - (map->map_bits + 1));
while (1) {
idx &= (map->map_size - 1);
entry = TRACING_MAP_ENTRY(map->map, idx);
test_key = entry->key;
if (test_key && test_key == key_hash && entry->val &&
keys_match(key, entry->val->key, map->key_size)) {
atomic64_inc(&map->hits);
return entry->val;
}
if (atomic64_read(&map->drops)) {
atomic64_inc(&map->drops);
break;
}
if (!test_key && !cmpxchg(&entry->key, 0, key_hash)) {
struct tracing_map_elt *elt;
elt = get_free_elt(map);
if (!elt) {
atomic64_inc(&map->drops);
break;
}
memcpy(elt->key, key, map->key_size);
entry->val = elt;
atomic64_inc(&map->hits);
return entry->val;
}
idx++;
}
return NULL;
}
Thanks,
Tom
>
> >
> > On the other hand, if users do end up calling this even after it's
> > returned NULL, we should make sure it doesn't result in an infinite
> > loop, and cap the number of iterations through the loop. tracing_map
> > wasn't really meant to be generally reusable - it was separated out so
> > it could be shared between two tracers - but it wouldn't hurt to add
> > that check just in case...
>
> Right.
>
> Thanks,
> Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/