Re: SMP spin-locks

From: Ingo Oeser (ingo.oeser@informatik.tu-chemnitz.de)
Date: Fri Jun 15 2001 - 07:10:47 EST


On Thu, Jun 14, 2001 at 05:05:07PM -0400, Richard B. Johnson wrote:
> The problem is that a data acquisition board across the PCI bus
> gives a data transfer rate of 10 to 11 megabytes per second
> with a UP kernel, and the transfer drops to 5-6 megabytes per
> second with a SMP kernel. The ISR is really simple and copies
> data, that's all.
>
> The 'read()' routine uses a spinlock when it modifies pointers.
>
> I started to look into where all the CPU clocks were going. The
> SMP spinlock code is where it's going. There is often contention
> for the lock because interrupts normally occur at 50 to 60 kHz.

Then you need another (better?) queueing mechanism.

Use multiple queues and a _overflowable_ sequence number as
global variable between the queues.

N Queues (N := no. of CPUs + 1), which have a spin_lock for each
queue.

optionally: One reader packet reassembly priority queue (APQ) ordered by
   sequence number (implicitly or explicitly), if this shouldn't
   be done in user space.

In the writer ISR:

   Foreach Queue in RR order (start with remebered one):
   - Try to lock it with spin_trylock (totally inline!)
     + Failed
        * if we failed to find a free queue for x "rounds", disable
          device (we have no reader) and notify user space somehow
       * increment "rounds"
       * next queue
     + Succeed
       * Increment sequence number
       * Put data record into queue
      (* remember this queue as last queue used)
      (* mark queue "not empty")
       * do other IRQ work...

In the reader routine:
   Foreach Queue in RR order (start with remebered one):
   - No data counter above threshold -> EAGAIN [1]
   - Try to lock it with spin_trylock (totally inline!)
     + Failed -> next queue
     + Succeed
       * if queue empty, unlock and try next one
      (* remember this queue as last queue used)
       * Get one data record from queue (in queue order!)
       * Move data record into APQ
       * Unlock queue
       * Deliver as much data from the APQ, as the user wants and
         is available
    - if all queues empty or locked -> increment "no data round"
      counter
  

Notes:
   The "last queue used" variable is static, but local to routine.
   It is there to decrease the number of iterations and distribute
   the data to all queues as more equally.

   Statistics about lock contention per queue, per round and per
   try would be nice here to estimate the number of queues
   needed.

   The APQ can be quite large, if the sequences are bad
   distributed and some queues tend to be always locked, if the
   reader wants to read from this queue.

   The above can be solved by 2^N "One entry queues" (aka slots)
   and sequence numbers mapping to this slots. If you need many
   slots (more then 256, I would say) then this is again
   inaccaptable, because of the iteration cost in the ISR.
   
What do you think? After some polishing this should decrease lock
contention noticibly.

Regards

Ingo Oeser

[1] Blocking will be harder to implement here, since we need to
   notify the reader routine, that we have data available, which
   involves some latency you cannot afford. Maybe this could be
   done via schedule_task(), if needed.

-- 
Use ReiserFS to get a faster fsck and Ext2 to fsck slowly and gently.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jun 15 2001 - 21:00:24 EST