Re: [RFC][PATCH] flex_array: conditionally optimize out divides

From: Dan Williams
Date: Tue Aug 18 2009 - 14:02:32 EST


On Mon, Aug 17, 2009 at 1:43 PM, Dave Hansen<dave@xxxxxxxxxxxxxxxxxx> wrote:
>
> There are three flex_array operations that require divides:
> 1. figuring out into which "part" we should access
> 2. figuring out where into that part we fit
> 3. figuring out in how many elements fit into a part
>
> Division can get expensive, and we may incur one or two
> divides for each put() or get() that is performed.  If we
> rounded the elements to a power-of-two and stored shifts
> and masks, we could rid ourselves of the divides, but we
> would lose storage space with oddly-sized objects.  We
> could code the implementation to handle divides and special-
> case the shifts when they can be used, but that would
> complicate the code.
>
> This is an alternative.  We introduce variants of
> flex_array_get() and flex_array_put() since they are the
> most common operations.  We append an _es() to their names
> (for Element Size) and get flex_array_get_es() and
> flex_array_put_es().  The allocation and free functions
> remain unoptimized since they're not indended to be hot
> paths.
>
> Passing the element size into each operation, and using it
> like this:
>
>        flex_array_get(fa, nr, sizeof(struct my_struct));
>
> lets the compiler turn the divides into shifts if 'my_struct'
> is a power-of-two in size.
>
> It seems that only gcc 4.1 and up are smart enough to figure
> this out, though.
>
> ---

Hi Dave,

Thanks for this. I'll give it a shot hopefully in the next few days.
One comment below...

> +/*
> + * Use the _es() variants when you want the compiler to
> + * be able to optimize the divides like when you have a
> + * power-of-two element_size.
> + */
> +static inline void *flex_array_get_es(struct flex_array *fa,
> +               int element_nr, int element_size)
> +{
> +       int part_nr = __fa_element_to_part_nr(element_size, element_nr);
> +       int index_inside = __fa_index_inside_part(element_size, element_nr);
> +
> +       if (element_nr >= fa->total_nr_elements)
> +               return NULL;

This if()...

> +
> +       return flex_array_get_precalc(fa, part_nr, index_inside);
> +}
> +
> +static inline int flex_array_put_es(struct flex_array *fa, int element_nr,
> +               int element_size, void *src, gfp_t flags)
> +{
> +       int part_nr = __fa_element_to_part_nr(element_size, element_nr);
> +       int index_inside = __fa_index_inside_part(element_size, element_nr);
> +
> +       if (element_nr >= fa->total_nr_elements)
> +               return -ENOSPC;

...and this one look like good candidates for unlikely() as these
additional branches may be a concern for the fast path.

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/