Re: [patch] __volatile__ needed in get_cycles()?

Andrea Arcangeli (andrea@e-mind.com)
Mon, 29 Mar 1999 19:58:50 +0200 (CEST)


On Mon, 29 Mar 1999, Tigran Aivazian wrote:

>Hi Andrea,
>On Mon, 29 Mar 1999, Andrea Arcangeli wrote:
>
>> On Mon, 29 Mar 1999, Tigran Aivazian wrote:
>>
>> >which would enforce "Von Neumann execution stream", e.g. by doing CPUID
>>
>> What is a Von Neumann execution stream? ;)
>P6 architecture (PPro, PII etc.) introduce speculative execution, i.e. if
>you for example try to "profile" fdiv by putting a couple of rdtsc before
>and after you will be told that fdiv took 0 cycles which is obviously not
>true (I wish it was :). This happens because the processor decides that
>the second rdtsc is independent from the fdiv and executes it first. So,
>one needs to serialize it somehow and the easiest way I know of doing it
>is cpuid (but one needs to remember that it clobbers registers).

Ah ok, the right thing to do is to add 0 at the stack pointer as wmb()
does.

The point is that you should do that in the caller if you want that
behavior.

barrier();
get_cycles();
barrier();

will be equivalent to your __volatile__. There's to say that barrier
will also flush the register set while only using volatile would preserve
it making a better profiling, but it depends on what you have to profile...

>some other purpose. Putting __volatile__ does not make the current usage
>of get_cycles() any worse so why not, if it gives you extra choice?

The compiler could have register pressure a bit before your rdtsc and I
think that reordering it could allow the compiler in some case to save
some access to memory. It's sure not a critical thing but the point is
that get_cycles() as it is used now, it _doesn't_ need __volatile__
according to me.

>I personally use it to count the number of cycles it takes for a
>particular code path (i.e. without having to enable profiling globally). I

That's a different usage!!

As first thing get_cycles() is fine right now and there's no bug.

Currently get_cycles() is used only to know delta times between
two schedule(). And the delta will be the _same_ even if rdtsc is
reordered. Do you see my point now? This was the offset and the delta I
was talking about in my previous email.

The point you are talking about is that if you will use get_cycles()
around a piece of code to profile it, you have also to add an mb() around
get_cycles().

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/