short summary:
The semaphore PIPE_SEM(*inode) is only taken for very brief amounts
of time, all real waiting is done with PIPE_WAIT. On a SMP system
replacing this semaphore with a spinlock avoids unnecessary scheduling.
long story:
While experimenting with the lockmeter-patch (to measure spinlock wait
and hold times), I got the following strange results:
CON WAIT TOTAL NOWAIT SPIN
58% 1.4us 22508 9657 12851 __down_interruptible+0x390
27% 0.7us 22508 16588 5920 __down_interruptible+0xb8
there was extremely high contention on the wait_queue lock
[0x390 = remove_wait_queue, 0xb8 = add_wait_queue_exclusive]
of a semaphore, due to contention on the semaphore itself.
All these calls to __down_interruptible where due to contention
on PIPE_SEM in pipe_read and pipe_write.
As PIPE_SEM is only held for a very short time, the "semaphore-overhead"
of:
- Adding the waiting process to a waitqueue
- schedule
in __down_interruptible()
- wake up the process
in __up()
and finally
- remove the process from the waitqueue
in down__interruptible() again
seems excessive [especially if bad timing leaves you spinning when
accessing the waitqueue]
I made a small patch that adds a spinlock_t variable to pipe_inode_info
and uses this spinlock to protect write and read. In all other functions
that do a down(PIPE_SEM), this spinlock is first taken and released
after
calling up(PIPE_SEM).
Can anyone tell me whether locking PIPE_SEM (which is just inode->i_sem
in disguise) is necessary for synchronizing with other parts of the
kernel that lock i_sem ?
If someone is interested in the patch, email me.
To reproduce the problem, just run a process writing to a pipe while
another
process is reading from it.
example: pipedis from the IX_SSBA benchmark
"pipedis 500000 1" takes ~5.7sec with 2.3.46 on my dual PIII, with the
patch
it just takes ~1.9sec [2.2.14 takes ~2.1sec]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Tue Feb 29 2000 - 21:00:13 EST