Re: [PATCH] AFS: Fix file locking

From: Satyam Sharma
Date: Mon Jul 23 2007 - 05:14:20 EST


[ Restricting discussion to the i386 bitops implementation. ]

Hi Nick,

On 7/23/07, Satyam Sharma <satyam.sharma@xxxxxxxxx> wrote:
Hi,

On 7/23/07, Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:
> Linus Torvalds wrote:
> >
> > On Fri, 20 Jul 2007, Nick Piggin wrote:
> >
> >>So you did. Then to answer that, yes it could be faster because there are
> >>stupid volatiles sprinkled all over the bitops code so you could easily
> >>end up having to do more loads. Does it make a real difference? Unlikely,
> >>but David loves counting cycles :)
> >
> >
> > I thought we long long since removed the volatiles. They are buggy and
> > horrible, and we really want to let the compiler combine multiple
> > test-bits, and if they matter that implies locking is buggy or something
> > worse..
> >
> > Ie we'd *want*
> >
> > if (test_bit(x, y) || test_bit(z,y))
> >
> > to be rewritten by the compiler as testing bits x/z at the same time.
>
> Yep. We'd also want __set_bit(x, y); __set_bit(z, y); and such to be
> combined.

BTW I'm also running some tests writing test code, compiling and verifying
the code gcc generates ... curiously, volatile-access-casting of the passed
bit-string address is not the only thing that'll prevent gcc's optimizer from
combining the operations such as ones you listed above. Then there are
-O2 vs -Os (and constant_test_bit() vs variable_test_bit()) differences I am
observing ... and sometimes just the inadequacy of gcc's optimizer -- note
that constant_test_bit() seems to go through extra hoops unnecessarily to
avoid honouring @nr >= 32, whereas none of the other primitives in that file
does that. So the i386 kernel's stock constant_test_bit implementation
ends up differing from David's open-coded versions quite drastically in
subtle ways, and again makes it difficult to combine the kind of operations
you guys are discussing here ...

It's a given, of course, that the code that gcc generates when combining
such operations would clearly be more optimal than the simple btl-sbbl pair
with test-and-conditional-jumps that would otherwise get generated ...

> > But now I'm too scared to look.
>
> Not a chance :) Even the asm-generic "reference" implementation ratifies
> the volatile crapiness. Would you take a patch?

Coincidentally, I'm working on a cleanup of the bitops code just now --
I stumbled upon a lot of varied bogosity in there :-)

Such as bogus/invalid asm constraints being passed in the inline assembly.
Probably gcc knows everybody gets its complicated extended asm wrong,
so doesn't barf when parsing such stuff ... :-)

Intend to send it
out in a couple of hours, probably.

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/