Re: CPU affinity

Zygo Blaxell (uixjjji1@umail.furryterror.org)
20 Apr 1999 22:58:35 -0400


In article <Pine.LNX.3.96.990415141035.16016A-100000@chiara.csoma.elte.hu>,
Ingo Molnar <mingo@chiara.csoma.elte.hu> wrote:
>On Thu, 15 Apr 1999, Rogier Wolff wrote:
>> 1M bytes cache size / 32 bytes/cache-line-fetch * 100ns = 3.3ms.
>>
>> for the cache to heat up.
>
>you are assuming the worst case. The above is true only if there is
>minimal 'cache-flux' (cachelines moving in and out of cache), and the
>whole cache is related to the application. Typical number crunching
>applications have either a considerable cache-flux, or have a rather small
>cache-footprint. In either case the 3.3ms worst-case 'cache-miss cost' is
>(often much) smaller.

Four words: real time video compression.

Full NTSC color fields are 300K or so, and you usually want at least
two in cache for prediction, motion-compensation, and so on. The access
to cache ranges from nearly sequential to almost random depending on the
motion compensation algorithm used. PCI bandwidth limits eat up 25% of
a CPU doing a capture-to-memory copy and another doing a memory-to-CPU
copy (I'd give my left arm for direct DMA to L2 cache, or a 4M shared
L2 cache in a fast SMP system so that one CPU can read frames into cache
while the other processes them...sigh). So after I/O there's ~8.3ms (in
NTSC anyway) available to process each frame. 3.3 ms is ~40% of that.
6.6ms (time for one cool-down/heat-up cycle) is 80% of that, and 13.2ms
(time for a cool-down/heat-up cycle for a 2M cache) is 160% of that
(probably even worse given the bus contention from the capture).

BTW along the way we need to compress audio too, and encode the resulting
data somewhere. My strategy is to use one CPU for doing the video
compression and one CPU for everything else (including backlog from the
first CPU if necessary).

>nevertheless we do not want processes to ping-pong between CPUs, and the
>Linux scheduler very actively tries to avoid it. A CPU-switch every 2
>seconds is not good, but even in the worst case costs only 3.3ms/2000 ms =
>0.16% slowdown.

Or at least one lost field every 2 seconds. That's enough to be visible.

>(also note that this is caused by xosview, a 'pure'
>number-crunching system will show no such pingpongs.

xosview is very CPU and I/O intensive at times (and of course there
are two processes involved in xosview because of the X server it talks
to on the same machine). On the other hand, the administrative and
I/O threads of a video capture process have similar properties: one is
trying to write to disk (buffer cache latency) and another is doing a
mix of number crunching and I/O.

-- 
Zygo Blaxell, Linux Engineer, Corel Corporation, zygob@corel.ca (work),
zblaxell@furryterror.org (play).  It's my opinion, I tell you! Mine! All MINE!
Linux mokona 2.2.2 #1 Mar 1 03:05 EST 1999 i586 up 6 days, 20:42
Linux naga 2.0.36 #1 Dec 29 13:11 EST 1998 up 11:35

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/