Hello all.
I am facing a weird problem with a virtual block driver I made concerning excessive I/O latency.
My block driver intercepts requests and redirects them to a real block device,
but not just be setting the bio->bi_bdev field, I create new bios.
Anyway, my problem is that for load balancing reasons I need per-CPU worker threads
where I enqueue requests and let them do all the work. If I use 2 threads in a round
robin manner (request 1 served by CPU 0, 2 by CPU1, 3 by CPU0 and so on), performance
is inexplicably low.
If I choose only one CPU to act as a worker the problem is gone. The difference of measured
I/O latency is more than 30 times.
What could be happening?
I'm using a vanilla 2.6.18.8.
Thanx in advance.