mingo@chiara.csoma.elte.hu wrote:
> The BARRIER thing might looks curious, but rdtsc has to be shielded
> from the measured section, otherwise rdtsc's uops might mix up and
> interact with the measured section - causing false results.
IIRC, Intel recommends that a "CPUID" instruction should be used:
it's a guaranteed serializing instruction.
> Anyway, i previously measured the
> overhead of clc vs. testl %X, %X before posting, and the testl version
> performs better here - maybe you can explain why. I've attached the code,
> the OVERHEAD #define is hand-tailored (with empty measured section the
> result should be 0 cycles) to my box - this can be different on other
> boxes.
I used a similar program and I think the answer is simple:
clc do not pair with themself, but they do pair (sometimes?) with other
instructions.
but clc is slightly slower that testl.
I've attached my program.
-- Manfred --------------9B6BCFB2FA4FDAACC0360BF1 Content-Type: text/plain; charset=us-ascii; name="timetest.cpp" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="timetest.cpp"/* * timetest.cpp: CPUID based performance tester. * * Copyright (C) 1999 by Manfred Spraul. * * Redistribution of this file is permitted under the terms of the GNU * Public License (GPL) * $Header: /pub/cvs/ms/timetest/timetest.cpp,v 1.2 1999/09/10 17:29:51 manfreds Exp $ */
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h>
char sample[4096];
// Intel recommends that a serializing instruction // should be called before and after rdtsc. // CPUID is a serializing instruction. #define read_rdtsc(time) \ __asm__ __volatile__( \ "cpuid\n\t" \ "rdtsc\n\t" \ "mov %%eax,(%0)\n\t" \ "mov %%edx,4(%0)\n\t" \ "cpuid\n\t" \ : /* no output */ \ : "S"(&time) \ : "eax", "ebx", "ecx", "edx", "memory")
static void zerotest() { unsigned long long time; unsigned long long time2;
read_rdtsc(time); read_rdtsc(time2); printf("total time for zerotest: %Ld ticks.\n", time2-time);
}
static void mingotest(int show) { unsigned long long time; unsigned long long time2;
read_rdtsc(time); #define CLC __asm__ __volatile__ ("clc\n\t" : : : "memory") #define TESTL __asm__ __volatile__ ("testl %%esi, %%esi \n\t" : : : "esi", "memory") #define DUMMY __asm__ __volatile__ ("movl %%esi, %%edi \n\t" : : : "esi", "edi", "memory")
// test 1: 200 CLC's: 199 ticks // #define INSTR CLC // test 2: 200 TESTL: 104 ticks #define INSTR TESTL // test 3: 200 CLC's and dummy instructions: 239 ticks // #define INSTR CLC; DUMMY // test 4: 200 TESTL's and dummy instructions: 200 ticks // #define INSTR TESTL; DUMMY
#define INSTR50 \ INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; \ INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; \ INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; \ INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; \ INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; \ \ INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; \ INSTR; INSTR
INSTR50; INSTR50; INSTR50; INSTR50;
read_rdtsc(time2); if(show) printf("total time for 200 INSTR: %Ld ticks.\n", time2-time);
}
int main() {
if(geteuid() == 0) { int res = nice(-20); if(res < 0) { perror("nice(-20)"); return 1; } printf("MOVETEST, reniced to (-20).\n"); } else { printf("MOVETEST called by non-superuser, running with normal priority.\n"); } sleep(1); zerotest(); zerotest(); sleep(1); zerotest(); zerotest(); sleep(1); zerotest(); zerotest();
sleep(1); mingotest(0); mingotest(1); sleep(1); mingotest(0); mingotest(1); sleep(1); mingotest(0); mingotest(1); sleep(1); mingotest(0); mingotest(1); return 0; }
--------------9B6BCFB2FA4FDAACC0360BF1--
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/