Fwd: A weird multi-thread performance problem

From: zhigang gong
Date: Mon May 31 2010 - 11:48:04 EST


Hi,

I experienced a very strange performance problem. I wrote a user space
application which has three threads:
Thread A, Thread B, Thread C. The initial state is that C is waiting
on a semaphore S0, B is waiting on another
semaphore S1.

Now A prepare a buffer, and do some processing on the buffer, and it
will wakeup C, then A will wait on semaphore S2.
C will copy the buffer and do some processing on the buffer, and then
it will wakeup B then itself will wait on S0 again.

B will copy the buffer and do the same as what A have done before to
pass the buffer to A through C.

It's a A-->C-->B-->C--->A sequence, and it will repeat for many times.
I measure the ping-pong latency at A
thread.

My test environment is
a 2core,Âeach core hasÂ4 hardware threadsïa intel machine. The linux
kernel version is 2.6.31.

Now the result is that when I bind all threads on 1 hardware thread,
the latency is about 17us.
taskset 10 ./latency_test.

But when I just execute ./latency_testÂÂÂit got about 40us.

I use vmstat to monitor the performance, and I found that:

For the first case, the interrupt count is much lesser than the second
case, whileÂthe context switch count is very close
to each other. The intterupt count for the first case is about 30K ,
for the second case is about 79K.
The ratio of 79K/30KÂis approximatlyÂequal to 40us/17us.

My question is why these two cases have such a different interrupt
count ? And is there any tool other than vmstat to measure
what is the intterupt source, it's IPC intterupt or a system call or
something else?

Is there anybody can give me a clue on this? I will very appreciate
for your help.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/